## here() starts at /Users/carriewright/Documents/GitHub/ocs-right-to-carry-case-study
Need to fix initialization problem
This case study will focus on conflicting findings from two papers.
It is important that we do not treat race as an objective measure. Despite this, it can be used to advance scientific inquiry. For more information on this topic, we have included a link to a paper on the use of race as a measure in epidemiology.
How does the inclusion of different numbers of age groups influence the results of an analysis of right to carry laws and violence rates?
This screenshot needs to be taken again. Cursor highlight is showing
We will evaluate how multicollinearity can influence linear regression results and result in different conclusions for Donohoe vs Lott on this very important topic. We will also discuss briefly how synthetic control methods can be used to assess the impact of policies by creating controls for comparison that did not have policy adoption but were otherwise similar.
This analysis will demonstrate how details about our methods can be critically influential for our overall conclusions and can result in important policy related consequences. This report will provide a basis for the motivation: https://www.nber.org/papers/w23510. As this is a historically controversial topic, we will focus on how different statistical methods can yield different results, but we will avoid making conclusions about right to carry laws.
Linear regression analysis, directed acyclic graphs, discussion about the influence of multicollinearity
## Loading required package: carData
library(plm) # fixed effect model, linear regression
library(broom) # tidy output
library(tidyverse) # general wrangling functions## ── Attaching packages ─────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.1 ✓ purrr 0.3.4
## ✓ tibble 3.0.1 ✓ dplyr 1.0.0
## ✓ tidyr 1.1.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::between() masks plm::between()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks plm::lag(), stats::lag()
## x dplyr::lead() masks plm::lead()
## x dplyr::recode() masks car::recode()
## x purrr::some() masks car::some()
## Using poppler version 0.73.0
##
## ********************************************************
## Note: As of version 1.0.0, cowplot does not change the
## default ggplot2 theme anymore. To recover the previous
## behavior, execute:
## theme_set(theme_cowplot())
## ********************************************************
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
## Loading required package: viridisLite
##
## Attaching package: 'viridis'
## The following object is masked from 'package:scales':
##
## viridis_pal
Below is table from the Donohue, et al. paper.
The datasets below were available to the respective authors at the time of their analyses.
The following data was downloaded from the US Census Bureau
## # A tibble: 6 x 4
## Region Division `State\n(FIPS)` Name
## <chr> <chr> <chr> <chr>
## 1 1 0 00 Northeast Region
## 2 1 1 00 New England Division
## 3 1 1 09 Connecticut
## 4 1 1 23 Maine
## 5 1 1 25 Massachusetts
## 6 1 1 33 New Hampshire
## [1] "character"
The following data was downloaded from the US Census Bureau.
## Parsed with column specification:
## cols(
## .default = col_number(),
## `Year of Estimate` = col_double(),
## `FIPS State Code` = col_character(),
## `State Name` = col_character(),
## `Race/Sex Indicator` = col_character()
## )
## See spec(...) for full column specifications.
## # A tibble: 6 x 22
## `Year of Estima… `FIPS State Cod… `State Name` `Race/Sex Indic…
## <dbl> <chr> <chr> <chr>
## 1 1970 01 Alabama White male
## 2 1970 01 Alabama White female
## 3 1970 01 Alabama Black male
## 4 1970 01 Alabama Black female
## 5 1970 01 Alabama Other races male
## 6 1970 01 Alabama Other races fem…
## # … with 18 more variables: `Under 5 years` <dbl>, `5 to 9 years` <dbl>, `10 to
## # 14 years` <dbl>, `15 to 19 years` <dbl>, `20 to 24 years` <dbl>, `25 to 29
## # years` <dbl>, `30 to 34 years` <dbl>, `35 to 39 years` <dbl>, `40 to 44
## # years` <dbl>, `45 to 49 years` <dbl>, `50 to 54 years` <dbl>, `55 to 59
## # years` <dbl>, `60 to 64 years` <dbl>, `65 to 69 years` <dbl>, `70 to 74
## # years` <dbl>, `75 to 79 years` <dbl>, `80 to 84 years` <dbl>, `85 years and
## # over` <dbl>
## [1] "Year of Estimate" "FIPS State Code" "State Name"
## [4] "Race/Sex Indicator" "Under 5 years" "5 to 9 years"
## [7] "10 to 14 years" "15 to 19 years" "20 to 24 years"
## [10] "25 to 29 years" "30 to 34 years" "35 to 39 years"
## [13] "40 to 44 years" "45 to 49 years" "50 to 54 years"
## [16] "55 to 59 years" "60 to 64 years" "65 to 69 years"
## [19] "70 to 74 years" "75 to 79 years" "80 to 84 years"
## [22] "85 years and over"
## [1] "numeric"
dem_77_79 <- dem_77_79 %>%
mutate(RACE = case_when(str_detect(`Race/Sex Indicator`,"Black") ~ "Black",
str_detect(`Race/Sex Indicator`,"White") ~ "White",
TRUE ~ "Other"),
SEX = case_when(str_detect(`Race/Sex Indicator`,"female") ~ "Female",
TRUE ~ "Male")) %>%
dplyr::select(-`Race/Sex Indicator`,-`FIPS State Code`)
dem_77_79 <- dem_77_79 %>%
rename("YEAR"=`Year of Estimate`,
"STATE"=`State Name`) %>%
filter(YEAR %in% 1977:1979)
dem_77_79 <- dem_77_79 %>%
pivot_longer(cols=contains("years"),
names_to = "AGE_GROUP",
values_to = "SUB_POP")
colnames(dem_77_79)## [1] "YEAR" "STATE" "RACE" "SEX" "AGE_GROUP" "SUB_POP"
pop_77_79 <- dem_77_79 %>%
group_by(YEAR, STATE) %>%
summarise("TOT_POP" = sum(SUB_POP), .groups = "drop")
colnames(pop_77_79)## [1] "YEAR" "STATE" "TOT_POP"
The following data was downloaded from the US Census Bureau.
County data was used for this decade.
dem_80_89 <- list.files(recursive = TRUE,
path = "docs/Demographics/Decade_1980/",
pattern = "*.csv",
full.names = TRUE) %>%
map(~read_csv(., skip=5))## Parsed with column specification:
## cols(
## .default = col_double(),
## `FIPS State and County Codes` = col_character(),
## `Race/Sex Indicator` = col_character()
## )
## See spec(...) for full column specifications.
## Parsed with column specification:
## cols(
## .default = col_double(),
## `FIPS State and County Codes` = col_character(),
## `Race/Sex Indicator` = col_character()
## )
## See spec(...) for full column specifications.
## Parsed with column specification:
## cols(
## .default = col_double(),
## `FIPS State and County Codes` = col_character(),
## `Race/Sex Indicator` = col_character()
## )
## See spec(...) for full column specifications.
## Parsed with column specification:
## cols(
## .default = col_double(),
## `FIPS State and County Codes` = col_character(),
## `Race/Sex Indicator` = col_character()
## )
## See spec(...) for full column specifications.
## Parsed with column specification:
## cols(
## .default = col_double(),
## `FIPS State and County Codes` = col_character(),
## `Race/Sex Indicator` = col_character()
## )
## See spec(...) for full column specifications.
## Parsed with column specification:
## cols(
## .default = col_double(),
## `FIPS State and County Codes` = col_character(),
## `Race/Sex Indicator` = col_character()
## )
## See spec(...) for full column specifications.
## Parsed with column specification:
## cols(
## .default = col_double(),
## `FIPS State and County Codes` = col_character(),
## `Race/Sex Indicator` = col_character()
## )
## See spec(...) for full column specifications.
## Parsed with column specification:
## cols(
## .default = col_double(),
## `FIPS State and County Codes` = col_character(),
## `Race/Sex Indicator` = col_character()
## )
## See spec(...) for full column specifications.
## Parsed with column specification:
## cols(
## .default = col_double(),
## `FIPS State and County Codes` = col_character(),
## `Race/Sex Indicator` = col_character()
## )
## See spec(...) for full column specifications.
## Parsed with column specification:
## cols(
## .default = col_double(),
## `FIPS State and County Codes` = col_character(),
## `Race/Sex Indicator` = col_character()
## )
## See spec(...) for full column specifications.
## Year of Estimate FIPS State and County Codes
## "numeric" "character"
## Race/Sex Indicator Under 5 years
## "character" "numeric"
## 5 to 9 years 10 to 14 years
## "numeric" "numeric"
## 15 to 19 years 20 to 24 years
## "numeric" "numeric"
## 25 to 29 years 30 to 34 years
## "numeric" "numeric"
## 35 to 39 years 40 to 44 years
## "numeric" "numeric"
## 45 to 49 years 50 to 54 years
## "numeric" "numeric"
## 55 to 59 years 60 to 64 years
## "numeric" "numeric"
## 65 to 69 years 70 to 74 years
## "numeric" "numeric"
## 75 to 79 years 80 to 84 years
## "numeric" "numeric"
## 85 years and over
## "numeric"
dem_80_89 <- dem_80_89 %>%
mutate(RACE = case_when(str_detect(`Race/Sex Indicator`,"Black") ~ "Black",
str_detect(`Race/Sex Indicator`,"White") ~ "White",
TRUE ~ "Other"),
SEX = case_when(str_detect(`Race/Sex Indicator`,"female") ~ "Female",
TRUE ~ "Male")) %>%
dplyr::select(-`Race/Sex Indicator`)
colnames(dem_80_89)## [1] "Year of Estimate" "FIPS State and County Codes"
## [3] "Under 5 years" "5 to 9 years"
## [5] "10 to 14 years" "15 to 19 years"
## [7] "20 to 24 years" "25 to 29 years"
## [9] "30 to 34 years" "35 to 39 years"
## [11] "40 to 44 years" "45 to 49 years"
## [13] "50 to 54 years" "55 to 59 years"
## [15] "60 to 64 years" "65 to 69 years"
## [17] "70 to 74 years" "75 to 79 years"
## [19] "80 to 84 years" "85 years and over"
## [21] "RACE" "SEX"
dem_80_89 <- dem_80_89 %>%
rename("YEAR"=`Year of Estimate`,
"STATEFP_temp"=`FIPS State and County Codes`) %>%
mutate(STATEFP = substr(STATEFP_temp, start = 1, stop = 2)) %>%
left_join(STATE_FIPS, by = "STATEFP") %>%
dplyr::select(-STATEFP)
dem_80_89 <- dem_80_89 %>%
pivot_longer(cols=contains("years"),
names_to = "AGE_GROUP",
values_to = "SUB_POP_temp") %>%
group_by(YEAR, STATE, AGE_GROUP, SEX, RACE) %>%
summarise(SUB_POP = sum(SUB_POP_temp), .groups="drop")
colnames(dem_80_89)## [1] "YEAR" "STATE" "AGE_GROUP" "SEX" "RACE" "SUB_POP"
pop_80_89 <- dem_80_89 %>%
group_by(YEAR, STATE) %>%
summarise("TOT_POP" = sum(SUB_POP), .groups = "drop")
colnames(pop_80_89)## [1] "YEAR" "STATE" "TOT_POP"
The following data was downloaded from the US Census Bureau.
dem_90_99 <- list.files(recursive = TRUE,
path = "docs/Demographics/Decade_1990/",
pattern = "*.txt",
full.names = TRUE) %>%
map(~read_table2(., skip = 14))## Warning: Duplicated column names deduplicated: 'Male' => 'Male_1' [6], 'Female'
## => 'Female_1' [7], 'Male' => 'Male_2' [8], 'Female' => 'Female_2' [9], 'Male' =>
## 'Male_3' [10], 'Female' => 'Female_3' [11], 'Male' => 'Male_4' [12], 'Female' =>
## 'Female_4' [13], 'Male' => 'Male_5' [14], 'Female' => 'Female_5' [15], 'Male' =>
## 'Male_6' [16], 'Female' => 'Female_6' [17], 'Male' => 'Male_7' [18], 'Female' =>
## 'Female_7' [19]
## Parsed with column specification:
## cols(
## Year = col_double(),
## e = col_character(),
## Age = col_double(),
## Male = col_double(),
## Female = col_double(),
## Male_1 = col_double(),
## Female_1 = col_double(),
## Male_2 = col_double(),
## Female_2 = col_double(),
## Male_3 = col_double(),
## Female_3 = col_double(),
## Male_4 = col_double(),
## Female_4 = col_double(),
## Male_5 = col_double(),
## Female_5 = col_double(),
## Male_6 = col_double(),
## Female_6 = col_double(),
## Male_7 = col_double(),
## Female_7 = col_double()
## )
## Warning: 1 parsing failure.
## row col expected actual file
## 1 -- 19 columns 1 columns 'docs/Demographics/Decade_1990//sasrh90.txt'
## Warning: Duplicated column names deduplicated: 'Male' => 'Male_1' [6], 'Female' => 'Female_1' [7], 'Male' => 'Male_2' [8], 'Female' => 'Female_2' [9], 'Male' => 'Male_3' [10], 'Female' => 'Female_3' [11], 'Male' => 'Male_4' [12], 'Female' => 'Female_4' [13], 'Male' => 'Male_5' [14], 'Female' => 'Female_5' [15], 'Male' => 'Male_6' [16], 'Female' => 'Female_6' [17], 'Male' => 'Male_7' [18], 'Female' => 'Female_7' [19]
## Parsed with column specification:
## cols(
## Year = col_double(),
## e = col_character(),
## Age = col_double(),
## Male = col_double(),
## Female = col_double(),
## Male_1 = col_double(),
## Female_1 = col_double(),
## Male_2 = col_double(),
## Female_2 = col_double(),
## Male_3 = col_double(),
## Female_3 = col_double(),
## Male_4 = col_double(),
## Female_4 = col_double(),
## Male_5 = col_double(),
## Female_5 = col_double(),
## Male_6 = col_double(),
## Female_6 = col_double(),
## Male_7 = col_double(),
## Female_7 = col_double()
## )
## Warning: 1 parsing failure.
## row col expected actual file
## 1 -- 19 columns 1 columns 'docs/Demographics/Decade_1990//sasrh91.txt'
## Warning: Duplicated column names deduplicated: 'Male' => 'Male_1' [6], 'Female' => 'Female_1' [7], 'Male' => 'Male_2' [8], 'Female' => 'Female_2' [9], 'Male' => 'Male_3' [10], 'Female' => 'Female_3' [11], 'Male' => 'Male_4' [12], 'Female' => 'Female_4' [13], 'Male' => 'Male_5' [14], 'Female' => 'Female_5' [15], 'Male' => 'Male_6' [16], 'Female' => 'Female_6' [17], 'Male' => 'Male_7' [18], 'Female' => 'Female_7' [19]
## Parsed with column specification:
## cols(
## Year = col_double(),
## e = col_character(),
## Age = col_double(),
## Male = col_double(),
## Female = col_double(),
## Male_1 = col_double(),
## Female_1 = col_double(),
## Male_2 = col_double(),
## Female_2 = col_double(),
## Male_3 = col_double(),
## Female_3 = col_double(),
## Male_4 = col_double(),
## Female_4 = col_double(),
## Male_5 = col_double(),
## Female_5 = col_double(),
## Male_6 = col_double(),
## Female_6 = col_double(),
## Male_7 = col_double(),
## Female_7 = col_double()
## )
## Warning: 1 parsing failure.
## row col expected actual file
## 1 -- 19 columns 1 columns 'docs/Demographics/Decade_1990//sasrh92.txt'
## Warning: Duplicated column names deduplicated: 'Male' => 'Male_1' [6], 'Female' => 'Female_1' [7], 'Male' => 'Male_2' [8], 'Female' => 'Female_2' [9], 'Male' => 'Male_3' [10], 'Female' => 'Female_3' [11], 'Male' => 'Male_4' [12], 'Female' => 'Female_4' [13], 'Male' => 'Male_5' [14], 'Female' => 'Female_5' [15], 'Male' => 'Male_6' [16], 'Female' => 'Female_6' [17], 'Male' => 'Male_7' [18], 'Female' => 'Female_7' [19]
## Parsed with column specification:
## cols(
## Year = col_double(),
## e = col_character(),
## Age = col_double(),
## Male = col_double(),
## Female = col_double(),
## Male_1 = col_double(),
## Female_1 = col_double(),
## Male_2 = col_double(),
## Female_2 = col_double(),
## Male_3 = col_double(),
## Female_3 = col_double(),
## Male_4 = col_double(),
## Female_4 = col_double(),
## Male_5 = col_double(),
## Female_5 = col_double(),
## Male_6 = col_double(),
## Female_6 = col_double(),
## Male_7 = col_double(),
## Female_7 = col_double()
## )
## Warning: 1 parsing failure.
## row col expected actual file
## 1 -- 19 columns 1 columns 'docs/Demographics/Decade_1990//sasrh93.txt'
## Warning: Duplicated column names deduplicated: 'Male' => 'Male_1' [6], 'Female' => 'Female_1' [7], 'Male' => 'Male_2' [8], 'Female' => 'Female_2' [9], 'Male' => 'Male_3' [10], 'Female' => 'Female_3' [11], 'Male' => 'Male_4' [12], 'Female' => 'Female_4' [13], 'Male' => 'Male_5' [14], 'Female' => 'Female_5' [15], 'Male' => 'Male_6' [16], 'Female' => 'Female_6' [17], 'Male' => 'Male_7' [18], 'Female' => 'Female_7' [19]
## Parsed with column specification:
## cols(
## Year = col_double(),
## e = col_character(),
## Age = col_double(),
## Male = col_double(),
## Female = col_double(),
## Male_1 = col_double(),
## Female_1 = col_double(),
## Male_2 = col_double(),
## Female_2 = col_double(),
## Male_3 = col_double(),
## Female_3 = col_double(),
## Male_4 = col_double(),
## Female_4 = col_double(),
## Male_5 = col_double(),
## Female_5 = col_double(),
## Male_6 = col_double(),
## Female_6 = col_double(),
## Male_7 = col_double(),
## Female_7 = col_double()
## )
## Warning: 1 parsing failure.
## row col expected actual file
## 1 -- 19 columns 1 columns 'docs/Demographics/Decade_1990//sasrh94.txt'
## Warning: Duplicated column names deduplicated: 'Male' => 'Male_1' [6], 'Female' => 'Female_1' [7], 'Male' => 'Male_2' [8], 'Female' => 'Female_2' [9], 'Male' => 'Male_3' [10], 'Female' => 'Female_3' [11], 'Male' => 'Male_4' [12], 'Female' => 'Female_4' [13], 'Male' => 'Male_5' [14], 'Female' => 'Female_5' [15], 'Male' => 'Male_6' [16], 'Female' => 'Female_6' [17], 'Male' => 'Male_7' [18], 'Female' => 'Female_7' [19]
## Parsed with column specification:
## cols(
## Year = col_double(),
## e = col_character(),
## Age = col_double(),
## Male = col_double(),
## Female = col_double(),
## Male_1 = col_double(),
## Female_1 = col_double(),
## Male_2 = col_double(),
## Female_2 = col_double(),
## Male_3 = col_double(),
## Female_3 = col_double(),
## Male_4 = col_double(),
## Female_4 = col_double(),
## Male_5 = col_double(),
## Female_5 = col_double(),
## Male_6 = col_double(),
## Female_6 = col_double(),
## Male_7 = col_double(),
## Female_7 = col_double()
## )
## Warning: 1 parsing failure.
## row col expected actual file
## 1 -- 19 columns 1 columns 'docs/Demographics/Decade_1990//sasrh95.txt'
## Warning: Duplicated column names deduplicated: 'Male' => 'Male_1' [6], 'Female' => 'Female_1' [7], 'Male' => 'Male_2' [8], 'Female' => 'Female_2' [9], 'Male' => 'Male_3' [10], 'Female' => 'Female_3' [11], 'Male' => 'Male_4' [12], 'Female' => 'Female_4' [13], 'Male' => 'Male_5' [14], 'Female' => 'Female_5' [15], 'Male' => 'Male_6' [16], 'Female' => 'Female_6' [17], 'Male' => 'Male_7' [18], 'Female' => 'Female_7' [19]
## Parsed with column specification:
## cols(
## Year = col_double(),
## e = col_character(),
## Age = col_double(),
## Male = col_double(),
## Female = col_double(),
## Male_1 = col_double(),
## Female_1 = col_double(),
## Male_2 = col_double(),
## Female_2 = col_double(),
## Male_3 = col_double(),
## Female_3 = col_double(),
## Male_4 = col_double(),
## Female_4 = col_double(),
## Male_5 = col_double(),
## Female_5 = col_double(),
## Male_6 = col_double(),
## Female_6 = col_double(),
## Male_7 = col_double(),
## Female_7 = col_double()
## )
## Warning: 1 parsing failure.
## row col expected actual file
## 1 -- 19 columns 1 columns 'docs/Demographics/Decade_1990//sasrh96.txt'
## Warning: Duplicated column names deduplicated: 'Male' => 'Male_1' [6], 'Female' => 'Female_1' [7], 'Male' => 'Male_2' [8], 'Female' => 'Female_2' [9], 'Male' => 'Male_3' [10], 'Female' => 'Female_3' [11], 'Male' => 'Male_4' [12], 'Female' => 'Female_4' [13], 'Male' => 'Male_5' [14], 'Female' => 'Female_5' [15], 'Male' => 'Male_6' [16], 'Female' => 'Female_6' [17], 'Male' => 'Male_7' [18], 'Female' => 'Female_7' [19]
## Parsed with column specification:
## cols(
## Year = col_double(),
## e = col_character(),
## Age = col_double(),
## Male = col_double(),
## Female = col_double(),
## Male_1 = col_double(),
## Female_1 = col_double(),
## Male_2 = col_double(),
## Female_2 = col_double(),
## Male_3 = col_double(),
## Female_3 = col_double(),
## Male_4 = col_double(),
## Female_4 = col_double(),
## Male_5 = col_double(),
## Female_5 = col_double(),
## Male_6 = col_double(),
## Female_6 = col_double(),
## Male_7 = col_double(),
## Female_7 = col_double()
## )
## Warning: 1 parsing failure.
## row col expected actual file
## 1 -- 19 columns 1 columns 'docs/Demographics/Decade_1990//sasrh97.txt'
## Warning: Duplicated column names deduplicated: 'Male' => 'Male_1' [6], 'Female' => 'Female_1' [7], 'Male' => 'Male_2' [8], 'Female' => 'Female_2' [9], 'Male' => 'Male_3' [10], 'Female' => 'Female_3' [11], 'Male' => 'Male_4' [12], 'Female' => 'Female_4' [13], 'Male' => 'Male_5' [14], 'Female' => 'Female_5' [15], 'Male' => 'Male_6' [16], 'Female' => 'Female_6' [17], 'Male' => 'Male_7' [18], 'Female' => 'Female_7' [19]
## Parsed with column specification:
## cols(
## Year = col_double(),
## e = col_character(),
## Age = col_double(),
## Male = col_double(),
## Female = col_double(),
## Male_1 = col_double(),
## Female_1 = col_double(),
## Male_2 = col_double(),
## Female_2 = col_double(),
## Male_3 = col_double(),
## Female_3 = col_double(),
## Male_4 = col_double(),
## Female_4 = col_double(),
## Male_5 = col_double(),
## Female_5 = col_double(),
## Male_6 = col_double(),
## Female_6 = col_double(),
## Male_7 = col_double(),
## Female_7 = col_double()
## )
## Warning: 1 parsing failure.
## row col expected actual file
## 1 -- 19 columns 1 columns 'docs/Demographics/Decade_1990//sasrh98.txt'
## Warning: Duplicated column names deduplicated: 'Male' => 'Male_1' [6], 'Female' => 'Female_1' [7], 'Male' => 'Male_2' [8], 'Female' => 'Female_2' [9], 'Male' => 'Male_3' [10], 'Female' => 'Female_3' [11], 'Male' => 'Male_4' [12], 'Female' => 'Female_4' [13], 'Male' => 'Male_5' [14], 'Female' => 'Female_5' [15], 'Male' => 'Male_6' [16], 'Female' => 'Female_6' [17], 'Male' => 'Male_7' [18], 'Female' => 'Female_7' [19]
## Parsed with column specification:
## cols(
## Year = col_double(),
## e = col_character(),
## Age = col_double(),
## Male = col_double(),
## Female = col_double(),
## Male_1 = col_double(),
## Female_1 = col_double(),
## Male_2 = col_double(),
## Female_2 = col_double(),
## Male_3 = col_double(),
## Female_3 = col_double(),
## Male_4 = col_double(),
## Female_4 = col_double(),
## Male_5 = col_double(),
## Female_5 = col_double(),
## Male_6 = col_double(),
## Female_6 = col_double(),
## Male_7 = col_double(),
## Female_7 = col_double()
## )
## Warning: 1 parsing failure.
## row col expected actual file
## 1 -- 19 columns 1 columns 'docs/Demographics/Decade_1990//sasrh99.txt'
dem_90_99_names <- list.files(recursive = TRUE,
path = "docs/Demographics/Decade_1990/",
pattern = "*.txt") %>%
str_extract("[9][0-9]") %>%
paste0("Year_19",.)
dem_90_99 <- dem_90_99 %>%
map_df(bind_rows)
colnames(dem_90_99)## [1] "Year" "e" "Age" "Male" "Female" "Male_1"
## [7] "Female_1" "Male_2" "Female_2" "Male_3" "Female_3" "Male_4"
## [13] "Female_4" "Male_5" "Female_5" "Male_6" "Female_6" "Male_7"
## [19] "Female_7"
## # A tibble: 6 x 19
## Year e Age Male Female Male_1 Female_1 Male_2 Female_2 Male_3 Female_3
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 NA <NA> NA NA NA NA NA NA NA NA NA
## 2 1990 01 0 20406 19101 9794 9414 103 90 192 170
## 3 1990 01 1 19393 18114 9475 9247 87 93 146 182
## 4 1990 01 2 18990 18043 9097 8837 97 100 175 160
## 5 1990 01 3 19246 17786 9002 8701 94 115 150 157
## 6 1990 01 4 19502 18366 9076 8989 108 114 168 178
## # … with 8 more variables: Male_4 <dbl>, Female_4 <dbl>, Male_5 <dbl>,
## # Female_5 <dbl>, Male_6 <dbl>, Female_6 <dbl>, Male_7 <dbl>, Female_7 <dbl>
colnames(dem_90_99) <- c("YEAR",
"STATEFP",
"Age",
"NH_W_M",
"NH_W_F",
"NH_B_M",
"NH_B_F",
"NH_AIAN_M",
"NH_AIAN_F",
"NH_API_M",
"NH_API_F",
"H_W_M",
"H_W_F",
"H_B_M",
"H_B_F",
"H_AIAN_M",
"H_AIAN_F",
"H_API_M",
"H_API_F")
dim(dem_90_99)## [1] 43870 19
dem_90_99 <- dem_90_99 %>%
mutate(W_M = NH_W_M + H_W_M,
W_F = NH_W_F + H_W_F,
B_M = NH_B_M + H_B_M,
B_F = NH_B_F + H_B_F,
AIAN_M = NH_AIAN_M + H_AIAN_M,
AIAN_F = NH_AIAN_F + H_AIAN_F,
API_M = NH_API_M + H_API_M,
API_F = NH_API_F + H_API_F,
n_na = rowSums(is.na(.))) %>%
dplyr::select(-starts_with("NH_"), -starts_with("H_"))
dem_90_99 %>%
group_by(n_na) %>%
tally()## # A tibble: 2 x 2
## n_na n
## <dbl> <int>
## 1 0 43860
## 2 19 10
empty_rows_na <- dem_90_99 %>%
group_by(n_na) %>%
tally() %>%
filter(n_na != 0) %>%
pull(n_na)
dem_90_99 <- dem_90_99 %>%
filter(n_na != empty_rows_na) %>%
dplyr::select(-n_na)
sapply(dem_90_99, class)## YEAR STATEFP Age W_M W_F B_M
## "numeric" "character" "numeric" "numeric" "numeric" "numeric"
## B_F AIAN_M AIAN_F API_M API_F
## "numeric" "numeric" "numeric" "numeric" "numeric"
## 10 to 14 years 15 to 19 years 20 to 24 years 25 to 29 years
## 3060 3060 3060 3060
## 30 to 34 years 35 to 39 years 40 to 44 years 45 to 49 years
## 3060 3060 3060 3060
## 5 to 9 years 50 to 54 years 55 to 59 years 60 to 64 years
## 3060 3060 3060 3060
## 65 to 69 years 70 to 74 years 75 to 79 years 80 to 84 years
## 3060 3060 3060 3060
## 85 years and over Under 5 years
## 3060 3060
dem_90_99 <- dem_90_99 %>%
mutate(AGE_GROUP = cut(Age,
breaks = seq(0,90, by=5),
right = FALSE,
labels = c("Under 5 years",
"5 to 9 years",
"10 to 14 years",
"15 to 19 years",
"20 to 24 years",
"25 to 29 years",
"30 to 34 years",
"35 to 39 years",
"40 to 44 years",
"45 to 49 years",
"50 to 54 years",
"55 to 59 years",
"60 to 64 years",
"65 to 69 years",
"70 to 74 years",
"75 to 79 years",
"80 to 84 years",
"85 years and over")
)) %>%
dplyr::select(-Age) %>%
mutate(AGE_GROUP = as.character(AGE_GROUP))
sapply(dem_90_99, class)## YEAR STATEFP W_M W_F B_M B_F
## "numeric" "character" "numeric" "numeric" "numeric" "numeric"
## AIAN_M AIAN_F API_M API_F AGE_GROUP
## "numeric" "numeric" "numeric" "numeric" "character"
dem_90_99 <- dem_90_99 %>%
group_by(YEAR, STATEFP, AGE_GROUP) %>%
summarise_at(vars(starts_with("W_"),
starts_with("B_"),
starts_with("AIAN_"),
starts_with("API_")), sum) %>%
ungroup() %>%
pivot_longer(cols = c(starts_with("W_"),
starts_with("B_"),
starts_with("AIAN_"),
starts_with("API_")),
names_to = "RACE",
values_to = "SUB_POP")
dem_90_99 <- dem_90_99 %>%
mutate(SEX = case_when(str_detect(RACE, "_M") ~ "Male",
TRUE ~ "Female"),
RACE = case_when(str_detect(RACE, "W_") ~ "White",
str_detect(RACE, "B_") ~ "Black",
TRUE ~ "Other")) %>%
left_join(STATE_FIPS, by = "STATEFP") %>%
dplyr::select(-STATEFP)
pop_90_99 <- dem_90_99 %>%
group_by(YEAR, STATE) %>%
summarise(TOT_POP = sum(SUB_POP), .groups = "drop")
dem_90_99 <- dem_90_99 %>%
left_join(pop_90_99, by=c("YEAR", "STATE")) %>%
mutate(PERC_SUB_POP = SUB_POP*100/TOT_POP) %>%
dplyr::select(-SUB_POP, -TOT_POP)The following data was downloaded from the US Census Bureau.
Click here for relevant technical documentation
dem_00_10 <- list.files(recursive = TRUE,
path = "docs/Demographics/Decade_2000/",
pattern = "*.csv",
full.names = TRUE) %>%
map(~read_csv(.))## Parsed with column specification:
## cols(
## .default = col_double(),
## NAME = col_character()
## )
## See spec(...) for full column specifications.
## REGION DIVISION STATE NAME
## "numeric" "numeric" "numeric" "character"
## SEX ORIGIN RACE AGEGRP
## "numeric" "numeric" "numeric" "numeric"
## ESTIMATESBASE2000 POPESTIMATE2000 POPESTIMATE2001 POPESTIMATE2002
## "numeric" "numeric" "numeric" "numeric"
## POPESTIMATE2003 POPESTIMATE2004 POPESTIMATE2005 POPESTIMATE2006
## "numeric" "numeric" "numeric" "numeric"
## POPESTIMATE2007 POPESTIMATE2008 POPESTIMATE2009 CENSUS2010POP
## "numeric" "numeric" "numeric" "numeric"
## POPESTIMATE2010
## "numeric"
dem_00_10 <- dem_00_10 %>%
dplyr::select(-ESTIMATESBASE2000,-CENSUS2010POP) %>%
filter(REGION != 0,
DIVISION != 0,
SEX != 0,
ORIGIN == 0,
RACE != 0,
AGEGRP != 0,
STATE != 0) %>%
dplyr::select(-REGION, -DIVISION, -ORIGIN, -STATE) %>%
rename("STATE"=NAME,
"AGE_GROUP"=AGEGRP) %>%
mutate(SEX = factor(SEX,
levels = 1:2,
labels = c("Male",
"Female")),
RACE = factor(RACE,
levels = 1:6,
labels = c("White",
"Black",
rep("Other",4))),
AGE_GROUP = factor(AGE_GROUP,
levels = 1:18,
labels = c("Under 5 years",
"5 to 9 years",
"10 to 14 years",
"15 to 19 years",
"20 to 24 years",
"25 to 29 years",
"30 to 34 years",
"35 to 39 years",
"40 to 44 years",
"45 to 49 years",
"50 to 54 years",
"55 to 59 years",
"60 to 64 years",
"65 to 69 years",
"70 to 74 years",
"75 to 79 years",
"80 to 84 years",
"85 years and over"))) %>%
mutate(SEX = as.character(SEX),
RACE = as.character(RACE),
AGE_GROUP = as.character(AGE_GROUP))
colnames(dem_00_10)## [1] "STATE" "SEX" "RACE" "AGE_GROUP"
## [5] "POPESTIMATE2000" "POPESTIMATE2001" "POPESTIMATE2002" "POPESTIMATE2003"
## [9] "POPESTIMATE2004" "POPESTIMATE2005" "POPESTIMATE2006" "POPESTIMATE2007"
## [13] "POPESTIMATE2008" "POPESTIMATE2009" "POPESTIMATE2010"
dem_00_10 <- dem_00_10 %>%
pivot_longer(cols=contains("ESTIMATE"),
names_to = "YEAR",
values_to = "SUB_POP")
dem_00_10 <- dem_00_10 %>%
mutate(YEAR = str_sub(YEAR, start=-4)) %>%
mutate(YEAR = as.numeric(YEAR))
sapply(dem_00_10, class)## STATE SEX RACE AGE_GROUP YEAR SUB_POP
## "character" "character" "character" "character" "numeric" "numeric"
pop_00_10 <- dem_00_10 %>%
group_by(YEAR, STATE) %>%
summarise(TOT_POP = sum(SUB_POP), .groups = "drop")
dem_00_10 %>%
left_join(pop_00_10, by=c("YEAR", "STATE")) %>%
group_by(YEAR, STATE) %>%
mutate(PERC_SUB_POP = SUB_POP*100/TOT_POP) %>%
summarise(perc_tot = sum(PERC_SUB_POP), .groups = "drop") %>%
mutate(poss_error = case_when(abs(perc_tot - 100) > 0 ~ TRUE,
TRUE ~ FALSE)) %>%
group_by(poss_error) %>%
tally()## # A tibble: 1 x 2
## poss_error n
## <lgl> <int>
## 1 FALSE 561
## [1] TRUE
## [1] TRUE
## [1] TRUE
## # A tibble: 6 x 6
## YEAR STATE RACE SEX AGE_GROUP PERC_SUB_POP
## <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 1977 Alabama White Male Under 5 years 2.61
## 2 1977 Alabama White Male 5 to 9 years 3.00
## 3 1977 Alabama White Male 10 to 14 years 3.25
## 4 1977 Alabama White Male 15 to 19 years 3.58
## 5 1977 Alabama White Male 20 to 24 years 3.33
## 6 1977 Alabama White Male 25 to 29 years 2.95
## # A tibble: 6 x 6
## YEAR STATE AGE_GROUP SEX RACE PERC_SUB_POP
## <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 1980 Alabama 10 to 14 years Female Black 1.28
## 2 1980 Alabama 10 to 14 years Female Other 0.0206
## 3 1980 Alabama 10 to 14 years Female White 2.80
## 4 1980 Alabama 10 to 14 years Male Black 1.30
## 5 1980 Alabama 10 to 14 years Male Other 0.0212
## 6 1980 Alabama 10 to 14 years Male White 2.97
## # A tibble: 6 x 6
## YEAR AGE_GROUP RACE SEX STATE PERC_SUB_POP
## <dbl> <chr> <chr> <chr> <chr> <dbl>
## 1 1990 10 to 14 years White Male Alabama 2.46
## 2 1990 10 to 14 years White Female Alabama 2.33
## 3 1990 10 to 14 years Black Male Alabama 1.21
## 4 1990 10 to 14 years Black Female Alabama 1.20
## 5 1990 10 to 14 years Other Male Alabama 0.0239
## 6 1990 10 to 14 years Other Female Alabama 0.0235
## # A tibble: 6 x 6
## STATE SEX RACE AGE_GROUP YEAR PERC_SUB_POP
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Alabama Male White Under 5 years 2000 2.24
## 2 Alabama Male White Under 5 years 2001 2.24
## 3 Alabama Male White Under 5 years 2002 2.22
## 4 Alabama Male White Under 5 years 2003 2.21
## 5 Alabama Male White Under 5 years 2004 2.20
## 6 Alabama Male White Under 5 years 2005 2.20
## [1] 18
## [1] 18
## [1] 18
## [1] 18
dem <- bind_rows(dem_77_79,
dem_80_89,
dem_90_99,
dem_00_10)
dem %>%
filter(RACE == "Other") %>%
group_by(YEAR) %>%
tally() %>%
summarise(years_data = n())## # A tibble: 1 x 1
## years_data
## <int>
## 1 34
## [1] 34
DONOHUE_AGE_GROUPS <- c("15 to 19 years",
"20 to 24 years",
"25 to 29 years",
"30 to 34 years",
"35 to 39 years")
DONOHUE_RACE <- c("White",
"Black",
"Other")
DONOHUE_SEX <- c("Male")
dem_DONOHUE <- dem %>%
filter(AGE_GROUP %in% DONOHUE_AGE_GROUPS,
RACE %in% DONOHUE_RACE,
SEX %in% DONOHUE_SEX) %>%
mutate(AGE_GROUP = fct_collapse(AGE_GROUP, "20 to 39 years"=c("20 to 24 years",
"25 to 29 years",
"30 to 34 years",
"35 to 39 years"))) %>%
mutate(AGE_GROUP = str_replace_all(AGE_GROUP," ","_")) %>%
group_by(YEAR, STATE, RACE, SEX, AGE_GROUP) %>%
summarise(PERC_SUB_POP = sum(PERC_SUB_POP), .groups = "drop") %>%
unite(col = "VARIABLE", RACE, SEX, AGE_GROUP, sep = "_") %>%
rename("VALUE"=PERC_SUB_POP)
LOTT_AGE_GROUPS_NULL <- c("Under 5 years",
"5 to 9 years")
LOTT_RACE <- c("White",
"Black",
"Other")
LOTT_SEX <- c("Male",
"Female")
dem_LOTT <- dem %>%
filter(!(AGE_GROUP %in% LOTT_AGE_GROUPS_NULL),
RACE %in% LOTT_RACE,
SEX %in% LOTT_SEX) %>%
mutate(AGE_GROUP = fct_collapse(AGE_GROUP,
"10 to 19 years"=c("10 to 14 years",
"15 to 19 years"),
"20 to 29 years"=c("20 to 24 years",
"25 to 29 years"),
"30 to 39 years"=c("30 to 34 years",
"35 to 39 years"),
"40 to 49 years"=c("40 to 44 years",
"45 to 49 years"),
"50 to 64 years"=c("50 to 54 years",
"55 to 59 years",
"60 to 64 years"),
"65 years and over"=c("65 to 69 years",
"70 to 74 years",
"75 to 79 years",
"80 to 84 years",
"85 years and over"))) %>%
mutate(AGE_GROUP = str_replace_all(AGE_GROUP," ","_")) %>%
group_by(YEAR, STATE, RACE, SEX, AGE_GROUP) %>%
summarise(PERC_SUB_POP = sum(PERC_SUB_POP), .groups = "drop") %>%
unite(col = "VARIABLE", RACE, SEX, AGE_GROUP, sep = "_") %>%
rename("VALUE"=PERC_SUB_POP)
dim(expand.grid(c(1:6), c(7:8), c(9:10)))[1]## [1] 24
## [1] TRUE
## [1] TRUE
## [1] TRUE
## # A tibble: 6 x 3
## YEAR STATE TOT_POP
## <dbl> <chr> <dbl>
## 1 1977 Alabama 3782571
## 2 1977 Alaska 397220
## 3 1977 Arizona 2427296
## 4 1977 Arkansas 2207195
## 5 1977 California 22350332
## 6 1977 Colorado 2696179
## # A tibble: 6 x 3
## YEAR STATE TOT_POP
## <dbl> <chr> <dbl>
## 1 1980 Alabama 3899671
## 2 1980 Alaska 404680
## 3 1980 Arizona 2735840
## 4 1980 Arkansas 2288809
## 5 1980 California 23792840
## 6 1980 Colorado 2909545
## # A tibble: 6 x 3
## YEAR STATE TOT_POP
## <dbl> <chr> <dbl>
## 1 1990 Alabama 4048508
## 2 1990 Alaska 553120
## 3 1990 Arizona 3679056
## 4 1990 Arkansas 2354343
## 5 1990 California 29950111
## 6 1990 Colorado 3303862
## # A tibble: 6 x 3
## YEAR STATE TOT_POP
## <dbl> <chr> <dbl>
## 1 2000 Alabama 4452173
## 2 2000 Alaska 627963
## 3 2000 Arizona 5160586
## 4 2000 Arkansas 2678588
## 5 2000 California 33987977
## 6 2000 Colorado 4326921
population_data <- bind_rows(pop_77_79,
pop_80_89,
pop_90_99,
pop_00_10)
population_data %>%
group_by(YEAR) %>%
tally() %>%
print(n = dim(.)[1])## # A tibble: 34 x 2
## YEAR n
## <dbl> <int>
## 1 1977 51
## 2 1978 51
## 3 1979 51
## 4 1980 51
## 5 1981 51
## 6 1982 51
## 7 1983 51
## 8 1984 51
## 9 1985 51
## 10 1986 51
## 11 1987 51
## 12 1988 51
## 13 1989 51
## 14 1990 51
## 15 1991 51
## 16 1992 51
## 17 1993 51
## 18 1994 51
## 19 1995 51
## 20 1996 51
## 21 1997 51
## 22 1998 51
## 23 1999 51
## 24 2000 51
## 25 2001 51
## 26 2002 51
## 27 2003 51
## 28 2004 51
## 29 2005 51
## 30 2006 51
## 31 2007 51
## 32 2008 51
## 33 2009 51
## 34 2010 51
The following data was downloaded from the Federal Bureau of Investigation.
ps_data <- read_csv("docs/Police_staffing/pe_1960_2018.csv",
col_types = cols(male_total_ct = "n",
female_total_ct = "n"))## Warning: 2874273 parsing failures.
## row col expected actual file
## 4115 female_officer_ct 1/0/T/F/TRUE/FALSE 8 'docs/Police_staffing/pe_1960_2018.csv'
## 4115 officer_ct 1/0/T/F/TRUE/FALSE 37 'docs/Police_staffing/pe_1960_2018.csv'
## 4115 civilian_ct 1/0/T/F/TRUE/FALSE 3 'docs/Police_staffing/pe_1960_2018.csv'
## 4115 total_pe_ct 1/0/T/F/TRUE/FALSE 40 'docs/Police_staffing/pe_1960_2018.csv'
## 4115 pe_ct_per_1000 1/0/T/F/TRUE/FALSE 2.00 'docs/Police_staffing/pe_1960_2018.csv'
## .... ................. .................. ...... .......................................
## See problems(...) for more details.
## [1] "data_year" "ori" "pub_agency_name"
## [4] "pub_agency_unit" "state_abbr" "division_name"
## [7] "region_name" "county_name" "agency_type_name"
## [10] "population_group_desc" "population" "male_officer_ct"
## [13] "male_civilian_ct" "male_total_ct" "female_officer_ct"
## [16] "female_civilian_ct" "female_total_ct" "officer_ct"
## [19] "civilian_ct" "total_pe_ct" "pe_ct_per_1000"
ps_data <- ps_data %>%
filter(data_year >= 1977,
data_year <= 2014) %>%
mutate(male_total_ct = case_when(is.na(male_total_ct) ~ 0,
TRUE ~ male_total_ct),
female_total_ct = case_when(is.na(female_total_ct) ~ 0,
TRUE ~ female_total_ct)) %>%
mutate(officer_total = male_total_ct + female_total_ct) %>%
dplyr::select(data_year,
pub_agency_name,
state_abbr,
officer_total)
ps_data <- ps_data %>%
group_by(data_year, state_abbr) %>%
summarise(officer_state_total=sum(officer_total), .groups = "drop")
ps_data %>%
group_by(state_abbr) %>%
tally() %>%
print(n = dim(.)[1])## # A tibble: 59 x 2
## state_abbr n
## <chr> <int>
## 1 AK 38
## 2 AL 38
## 3 AR 38
## 4 AS 38
## 5 AZ 38
## 6 CA 38
## 7 CO 38
## 8 CT 38
## 9 CZ 38
## 10 DC 38
## 11 DE 38
## 12 FL 38
## 13 FS 38
## 14 GA 38
## 15 GM 38
## 16 HI 38
## 17 IA 38
## 18 ID 38
## 19 IL 38
## 20 IN 38
## 21 KS 38
## 22 KY 38
## 23 LA 38
## 24 MA 38
## 25 MD 38
## 26 ME 38
## 27 MI 38
## 28 MN 38
## 29 MO 38
## 30 MP 38
## 31 MS 38
## 32 MT 38
## 33 NB 38
## 34 NC 38
## 35 ND 38
## 36 NH 38
## 37 NJ 38
## 38 NM 38
## 39 NV 38
## 40 NY 38
## 41 OH 38
## 42 OK 38
## 43 OR 38
## 44 OT 38
## 45 PA 38
## 46 PR 38
## 47 RI 38
## 48 SC 38
## 49 SD 38
## 50 TN 38
## 51 TX 38
## 52 UT 38
## 53 VA 38
## 54 VI 38
## 55 VT 38
## 56 WA 38
## 57 WI 38
## 58 WV 38
## 59 WY 38
# NB is Nebraska. This was changed to NE to avoid confusions with NB in Canada. This dataset uses NB
state_of_interest_NULL <- c("AS",
"GM",
"CZ",
"FS",
"MP",
"OT",
"PR",
"VI")
state_abb_df <- as.data.frame(cbind(state.abb, state.name))
colnames(state_abb_df) <- c("state_abbr", "STATE")
print(state_abb_df)## state_abbr STATE
## 1 AL Alabama
## 2 AK Alaska
## 3 AZ Arizona
## 4 AR Arkansas
## 5 CA California
## 6 CO Colorado
## 7 CT Connecticut
## 8 DE Delaware
## 9 FL Florida
## 10 GA Georgia
## 11 HI Hawaii
## 12 ID Idaho
## 13 IL Illinois
## 14 IN Indiana
## 15 IA Iowa
## 16 KS Kansas
## 17 KY Kentucky
## 18 LA Louisiana
## 19 ME Maine
## 20 MD Maryland
## 21 MA Massachusetts
## 22 MI Michigan
## 23 MN Minnesota
## 24 MS Mississippi
## 25 MO Missouri
## 26 MT Montana
## 27 NE Nebraska
## 28 NV Nevada
## 29 NH New Hampshire
## 30 NJ New Jersey
## 31 NM New Mexico
## 32 NY New York
## 33 NC North Carolina
## 34 ND North Dakota
## 35 OH Ohio
## 36 OK Oklahoma
## 37 OR Oregon
## 38 PA Pennsylvania
## 39 RI Rhode Island
## 40 SC South Carolina
## 41 SD South Dakota
## 42 TN Tennessee
## 43 TX Texas
## 44 UT Utah
## 45 VT Vermont
## 46 VA Virginia
## 47 WA Washington
## 48 WV West Virginia
## 49 WI Wisconsin
## 50 WY Wyoming
state_abb_df <- state_abb_df %>%
add_row(state_abbr="DC",
STATE="District of Columbia")
denominator_temp <- population_data %>%
dplyr::select(-VARIABLE) %>%
rename("Population_temp"=VALUE)
ps_data <- ps_data %>%
filter(!(state_abbr %in% state_of_interest_NULL)) %>%
mutate(state_abbr = case_when(state_abbr == "NB" ~ "NE",
TRUE ~ state_abbr)) %>%
left_join(state_abb_df, by = "state_abbr") %>%
dplyr::select(-state_abbr) %>%
rename(YEAR = "data_year",
VALUE = "officer_state_total") %>%
mutate(VARIABLE = "officer_state_total") %>%
left_join(denominator_temp, by=c("STATE","YEAR")) %>%
mutate(VALUE = (VALUE*100000) / Population_temp) %>%
mutate(VALUE = lag(VALUE)) %>%
mutate(VARIABLE = "police_per_100k_lag") %>%
dplyr::select(-Population_temp)https://data.bls.gov/cgi-bin/dsrv?la
ue_rate_data <- list.files(recursive = TRUE,
path = "docs/Unemployment",
pattern = "*.xlsx",
full.names = TRUE) %>%
map(~read_xlsx(., skip = 10))
ue_rate_names <- list.files(recursive = TRUE,
path = "docs/Unemployment",
pattern = "*.xlsx",
full.names = TRUE) %>%
map(~read_xlsx(.)) %>%
sapply(., "[",7,2, drop=TRUE)## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
## # A tibble: 1 x 14
## Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2020 3.2 2.9 3 13.3 NA NA NA NA NA NA NA NA
## # … with 1 more variable: Annual <dbl>
## [1] "STATE" "Year" "Jan" "Feb" "Mar" "Apr" "May" "Jun"
## [9] "Jul" "Aug" "Sep" "Oct" "Nov" "Dec" "Annual"
## STATE Year Jan Feb Mar Apr
## "character" "numeric" "numeric" "numeric" "numeric" "numeric"
## May Jun Jul Aug Sep Oct
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
## Nov Dec Annual
## "numeric" "numeric" "numeric"
Extracted from Table 21 from US Census Bureau
persistent warning from unknown origin https://community.rstudio.com/t/persistent-unknown-or-uninitialised-column-warnings/64879
solution to above is alledgedly: “In any case the suggested approach is to initialize the column”
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## # A tibble: 6 x 6
## `NOTE: Number in thousa… ...2 ...3 ...4 ...5 ...6
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 2018 <NA> <NA> <NA> <NA> <NA>
## 2 STATE Total Number "Standard\n… Percent "Standard\ner…
## 3 Alabama 4877 779 "65" 16 "1.3"
## 4 Alaska 720 94 "9" 13.1 "1.2"
## 5 Arizona 7241 929 "80" 12.80000000… "1.1000000000…
## 6 Arkansas 2912 462 "38" 15.9 "1.3"
colnames(poverty_rate_data) <- c("STATE",
"Total",
"Number",
"Number_se",
"Percent",
"Percent_se")
tail(poverty_rate_data)## # A tibble: 6 x 6
## STATE Total Number Number_se Percent Percent_se
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Wisconsin 4724 403 57 8.5 1.1000000000…
## 2 Wyoming 468 49 20 10.4 4
## 3 Standard errors shown in this ta… <NA> <NA> <NA> <NA> <NA>
## 4 For information on confidentiali… <NA> <NA> <NA> <NA> <NA>
## 5 Footnotes are available at <www.… <NA> <NA> <NA> <NA> <NA>
## 6 SOURCE: U.S. Bureau of the Censu… <NA> <NA> <NA> <NA> <NA>
notes <- 4
poverty_rate_data <- poverty_rate_data[-((dim(poverty_rate_data)[1]-notes+1):dim(poverty_rate_data)[1]),]
states_eq <- 51
extra_col <- 2
rep_rows <- states_eq + extra_col
groups <- (dim(poverty_rate_data)[1])/(rep_rows)
paste(groups - (2018-1980 + 1), "extra groups")## [1] "2 extra groups"
poverty_rate_data$year_group <- rep(1:groups, each=rep_rows)
poverty_rate_data <- poverty_rate_data %>%
group_by(year_group) %>%
group_split()
head(poverty_rate_data[[1]])## # A tibble: 6 x 7
## STATE Total Number Number_se Percent Percent_se year_group
## <chr> <chr> <chr> <chr> <chr> <chr> <int>
## 1 2018 <NA> <NA> <NA> <NA> <NA> 1
## 2 STATE Total Number "Standard\ner… Percent "Standard\nerro… 1
## 3 Alabama 4877 779 "65" 16 "1.3" 1
## 4 Alaska 720 94 "9" 13.1 "1.2" 1
## 5 Arizona 7241 929 "80" 12.8000000000… "1.100000000000… 1
## 6 Arkans… 2912 462 "38" 15.9 "1.3" 1
poverty_rate_data <- poverty_rate_data %>%
map(~mutate(.,
row_id = row_number())) %>%
map(~filter(.,row_id != 2)) %>%
map(~dplyr::select(.,-row_id))
poverty_rate_data_names <- poverty_rate_data %>%
sapply(., "[",1,1, drop=TRUE) %>%
str_replace_all(.,"[:space:]","_")
names(poverty_rate_data) <- poverty_rate_data_names
# Recall 2 extra groups.
# footnotes available at https://www.census.gov/topics/income-poverty/poverty/guidance/poverty-footnotes/cps-historic-footnotes.html
poverty_rate_data$`2017_(21)` <- NULL
poverty_rate_data$`2013_(19)` <- NULL
poverty_rate_data_names <- poverty_rate_data %>%
sapply(., "[",1,1, drop=TRUE) %>%
str_sub(., start = 1, end=4)
names(poverty_rate_data) <- poverty_rate_data_names
poverty_rate_data <- poverty_rate_data %>%
map_df(bind_rows, .id = "YEAR") %>%
dplyr::select(-year_group)
poverty_rate_data <- poverty_rate_data %>%
mutate(n_na = rowSums(is.na(.)))
# This shows that there is systematic missing values stemmingly *solely* from the rows without poverty data and only a label designating the year
poverty_rate_data %>%
group_by(n_na) %>%
tally()## # A tibble: 2 x 2
## n_na n
## <dbl> <int>
## 1 0 1989
## 2 5 39
## YEAR STATE Total Number Number_se Percent
## "character" "character" "character" "character" "character" "character"
## Percent_se n_na
## "character" "numeric"
poverty_rate_data <- poverty_rate_data %>%
drop_na() %>%
dplyr::select(-Number,
-Number_se,
-Percent_se,
-n_na,
-Total) %>%
rename("VALUE"=Percent) %>%
mutate(VARIABLE = "Poverty_rate",
YEAR = as.numeric(YEAR),
VALUE = as.numeric(VALUE))
colnames(poverty_rate_data)## [1] "YEAR" "STATE" "VALUE" "VARIABLE"
https://www.ucrdatatool.gov/Search/Crime/State/StatebyState.cfm
crime_data <- read_lines("docs/Crime/CrimeStatebyState.csv", skip = 2, skip_empty_rows = TRUE)
length(crime_data)## [1] 2254
crime_data <- crime_data[-(2143:length(crime_data))]
x <- 2014-1977+1
rep_cycle <- 2 + 2 + x
rep_cycle_cut <- 2 + x
delete_rows <- c(seq(2,length(crime_data),rep_cycle),
seq(3,length(crime_data),rep_cycle))
crime_data <- crime_data[-delete_rows]
crime_data <- data.frame(cbind(crime_data, rep(1:(length(crime_data)/rep_cycle_cut),each=rep_cycle_cut)))
colnames(crime_data) <- c("String","STATE_GROUP")
crime_data <- crime_data %>%
group_by(STATE_GROUP) %>%
group_split()
columns_crime_data <- 8
crime_data <- crime_data %>%
map(~mutate(.,
State = case_when(str_detect(String, "Estimated crime in ") ~ substring(String, nchar("Estimated crime in ")+1)),
row_id = row_number())) %>%
map(~fill(., State)) %>%
map(~filter(.,row_id > 2)) %>%
map(~mutate(.,
String = paste0(String, ",", State))) %>%
map(~dplyr::select(.,String)) %>%
map(~str_split_fixed(.$String,",",columns_crime_data + 1)) %>%
map(~data.frame(.)) %>%
map(~rename(.,"YEAR"=X1,
"Extra_col1"=X2,
"VC"=X3,
"Extra_col2"=X4,
"Extra_col3"=X5,
"Extra_col4"=X6,
"Extra_col5"=X7,
"Extra_col6"=X8,
"STATE"=X9)) %>%
map(~dplyr::select(.,-contains("Extra_col"))) %>%
map(~.x %>% mutate_all(~trimws(.,which = "both"))) %>%
map_df(bind_rows)
sapply(crime_data, class)## YEAR VC STATE
## "character" "character" "character"
crime_data <- crime_data %>%
mutate(VARIABLE = "Viol_crime_count") %>%
rename("VALUE" = VC) %>%
as.tibble() %>%
mutate(YEAR = as.numeric(YEAR),
VALUE = as.numeric(VALUE))## Warning: `as.tibble()` is deprecated as of tibble 2.0.0.
## Please use `as_tibble()` instead.
## The signature and semantics have changed, see `?as_tibble`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
Extracted from table in Donohue paper
if(!file.exists(here("docs", "w23510.pdf"))){
url <- "https://www.nber.org/papers/w23510.pdf"
utils::download.file(url, here("docs", "w23510.pdf"))
}
syn_control_paper <- pdf_text(here("docs", "w23510.pdf"))
syn_control_paper_p_62 <- syn_control_paper[[62]]
p_62 <- syn_control_paper_p_62 %>%
strsplit("\n") %>%
unlist() %>%
as.data.frame() %>%
slice(-(1:2))
apply(p_62, 1, nchar)## [1] 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
## [20] 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109 109
## [39] 109 109 109 109 109 109 109 109 109 109 109 109 109 109 63
## [1] " 60"
## [1] 3 4 4 4 3 4 2 3 2 4 4 3 4 3 4 4 4 4 4 4 3 2 4 4 4 4 4 4 4 2 3 3 3 3 3 4 4 4
## [39] 3 3 2 3 3 4 4 4 3 4 2 3 4 4
## [1] 2 3 3 3 2 3 2 2 2 3 3 2 3 2 3 3 3 3 3 3 2 2 3 3 3 3 3 3 3 2 2 3 2 3 3 3 3 3
## [39] 3 3 2 3 3 3 3 3 2 3 2 3 3 3
## [1] 2 2 2 2 1 2 1 1 1 2 2 2 2 1 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2
## [39] 2 2 1 2 2 2 2 2 2 2 1 2 2 2
## [1] 1 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0
## [39] 0 0 1 0 0 0 0 0 1 0 1 0 0 0
## .
## 1 Alabama 1975 1975
## 2 Alaska 10/1/1994 0.252 1995
## 3 Arizona 7/17/1994 0.460 1995
## 4 Arkansas 7/27/1995 0.433 1996
## 5 California N/A 0
## 6 Colorado 5/17/2003 0.627 2003
## apply(p_62, 1, str_count, "\\\\s{40,}")
## 1 1
## 2 0
## 3 0
## 4 0
## 5 1
## 6 0
p_62 <- p_62 %>%
apply(1,str_replace_all, "\\s{40,}", "|N/A|") %>%
str_replace_all("\\s{2,15}", "|") %>%
as.data.frame()
p_62 <- sapply(p_62$., str_split, "\\|{1,}")
sapply(p_62, nchar)## $`|Alabama||1975|N/A|1975`
## [1] 0 7 4 3 4
##
## $`|Alaska||10/1/1994||0.252|||1995`
## [1] 0 6 9 5 4
##
## $`|Arizona| 7/17/1994||0.460|||1995`
## [1] 0 7 10 5 4
##
## $`|Arkansas| 7/27/1995||0.433|||1996`
## [1] 0 8 10 5 4
##
## $`|California||N/A|N/A|0`
## [1] 0 10 3 3 1
##
## $`|Colorado| 5/17/2003||0.627|||2003`
## [1] 0 8 10 5 4
##
## $`|Connecticut||1970|N/A|1970`
## [1] 0 11 4 3 4
##
## $`|Delaware||N/A|N/A|0`
## [1] 0 8 3 3 1
##
## $`District of Columbia|N/A|N/A|0`
## [1] 20 3 3 1
##
## $`|Florida| 10/1/1987||0.252|||1988`
## [1] 0 7 10 5 4
##
## $`|Georgia| 8/25/1989||0.353|||1990`
## [1] 0 7 10 5 4
##
## $`|Hawaii||N/A|N/A|0`
## [1] 0 6 3 3 1
##
## $`|Idaho||7/1/1990||0.504|||1990`
## [1] 0 5 8 5 4
##
## $`|Illinois| 1/5/2014|N/A|2014`
## [1] 0 8 9 3 4
##
## $`|Indiana| 1/15/1980||0.962|||1980`
## [1] 0 7 10 5 4
##
## $`|Iowa||1/1/2011||1.000|||2011`
## [1] 0 4 8 5 4
##
## $`|Kansas||1/1/2007||1.000|||2007`
## [1] 0 6 8 5 4
##
## $`|Kentucky| 10/1/1996||0.251|||1997`
## [1] 0 8 10 5 4
##
## $`|Louisiana|4/19/1996||0.702|||1996`
## [1] 0 9 9 5 4
##
## $`|Maine||9/19/1985||0.285|||1986`
## [1] 0 5 9 5 4
##
## $`|Maryland||N/A|N/A|0`
## [1] 0 8 3 3 1
##
## $`|Massachusetts||N/A|N/A|0`
## [1] 0 13 3 3 1
##
## $`|Michigan||7/1/2001||0.504|||2001`
## [1] 0 8 8 5 4
##
## $`|Minnesota| 5/28/2003||0.597|||2003`
## [1] 0 9 10 5 4
##
## $`|Mississippi|7/1/1990||0.504|||1990`
## [1] 0 11 8 5 4
##
## $`|Missouri| 2/26/2004||0.847|||2004`
## [1] 0 8 10 5 4
##
## $`|Montana||10/1/1991||0.252|||1992`
## [1] 0 7 9 5 4
##
## $`|Nebraska||1/1/2007||1.000|||2007`
## [1] 0 8 8 5 4
##
## $`|Nevada||10/1/1995||0.252|||1996`
## [1] 0 6 9 5 4
##
## $`|New Hampshire||1959|N/A|1959`
## [1] 0 13 4 3 4
##
## $`|New Jersey||N/A|N/A|0`
## [1] 0 10 3 3 1
##
## $`|New Mexico||1/1/2004||1.000|||2004`
## [1] 0 10 8 5 4
##
## $`|New York||N/A|N/A|0`
## [1] 0 8 3 3 1
##
## $`|North Carolina|12/1/1995||0.085|||1996`
## [1] 0 14 9 5 4
##
## $`|North Dakota| 8/1/1985||0.419|||1986`
## [1] 0 12 9 5 4
##
## $`|Ohio||4/8/2004||0.732|||2004`
## [1] 0 4 8 5 4
##
## $`|Oklahoma||1/1/1996||1.000|||1996`
## [1] 0 8 8 5 4
##
## $`|Oregon||1/1/1990||1.000|||1990`
## [1] 0 6 8 5 4
##
## $`|Pennsylvania|6/17/1989||0.542|||1989`
## [1] 0 12 9 5 4
##
## $`|Philadelphia|10/11/1995||0.225|||1996`
## [1] 0 12 10 5 4
##
## $`|Rhode Island||N/A|N/A|0`
## [1] 0 12 3 3 1
##
## $`|South Carolina|8/23/1996||0.358|||1997`
## [1] 0 14 9 5 4
##
## $`|South Dakota| 7/1/1985||0.504|||1985`
## [1] 0 12 9 5 4
##
## $`|Tennessee| 10/1/1996||0.251|||1997`
## [1] 0 9 10 5 4
##
## $`|Texas||1/1/1996||1.000|||1996`
## [1] 0 5 8 5 4
##
## $`|Utah||5/1/1995||0.671|||1995`
## [1] 0 4 8 5 4
##
## $`|Vermont||1970|N/A|1970`
## [1] 0 7 4 3 4
##
## $`|Virginia| 5/5/1995||0.660|||1995`
## [1] 0 8 9 5 4
##
## $`|Washington||1961|N/A|1961`
## [1] 0 10 4 3 4
##
## $`|West Virginia|7/7/1989||0.488|||1990`
## [1] 0 13 8 5 4
##
## $`|Wisconsin| 11/1/2011||0.167|||2012`
## [1] 0 9 10 5 4
##
## $`|Wyoming||10/1/1994||0.252|||1995`
## [1] 0 7 9 5 4
p_62 <- lapply(p_62, function(x) x[nchar(x) > 0])
p_62 <- as.data.frame(do.call(rbind, p_62))
rownames(p_62)## [1] "|Alabama||1975|N/A|1975"
## [2] "|Alaska||10/1/1994||0.252|||1995"
## [3] "|Arizona| 7/17/1994||0.460|||1995"
## [4] "|Arkansas| 7/27/1995||0.433|||1996"
## [5] "|California||N/A|N/A|0"
## [6] "|Colorado| 5/17/2003||0.627|||2003"
## [7] "|Connecticut||1970|N/A|1970"
## [8] "|Delaware||N/A|N/A|0"
## [9] "District of Columbia|N/A|N/A|0"
## [10] "|Florida| 10/1/1987||0.252|||1988"
## [11] "|Georgia| 8/25/1989||0.353|||1990"
## [12] "|Hawaii||N/A|N/A|0"
## [13] "|Idaho||7/1/1990||0.504|||1990"
## [14] "|Illinois| 1/5/2014|N/A|2014"
## [15] "|Indiana| 1/15/1980||0.962|||1980"
## [16] "|Iowa||1/1/2011||1.000|||2011"
## [17] "|Kansas||1/1/2007||1.000|||2007"
## [18] "|Kentucky| 10/1/1996||0.251|||1997"
## [19] "|Louisiana|4/19/1996||0.702|||1996"
## [20] "|Maine||9/19/1985||0.285|||1986"
## [21] "|Maryland||N/A|N/A|0"
## [22] "|Massachusetts||N/A|N/A|0"
## [23] "|Michigan||7/1/2001||0.504|||2001"
## [24] "|Minnesota| 5/28/2003||0.597|||2003"
## [25] "|Mississippi|7/1/1990||0.504|||1990"
## [26] "|Missouri| 2/26/2004||0.847|||2004"
## [27] "|Montana||10/1/1991||0.252|||1992"
## [28] "|Nebraska||1/1/2007||1.000|||2007"
## [29] "|Nevada||10/1/1995||0.252|||1996"
## [30] "|New Hampshire||1959|N/A|1959"
## [31] "|New Jersey||N/A|N/A|0"
## [32] "|New Mexico||1/1/2004||1.000|||2004"
## [33] "|New York||N/A|N/A|0"
## [34] "|North Carolina|12/1/1995||0.085|||1996"
## [35] "|North Dakota| 8/1/1985||0.419|||1986"
## [36] "|Ohio||4/8/2004||0.732|||2004"
## [37] "|Oklahoma||1/1/1996||1.000|||1996"
## [38] "|Oregon||1/1/1990||1.000|||1990"
## [39] "|Pennsylvania|6/17/1989||0.542|||1989"
## [40] "|Philadelphia|10/11/1995||0.225|||1996"
## [41] "|Rhode Island||N/A|N/A|0"
## [42] "|South Carolina|8/23/1996||0.358|||1997"
## [43] "|South Dakota| 7/1/1985||0.504|||1985"
## [44] "|Tennessee| 10/1/1996||0.251|||1997"
## [45] "|Texas||1/1/1996||1.000|||1996"
## [46] "|Utah||5/1/1995||0.671|||1995"
## [47] "|Vermont||1970|N/A|1970"
## [48] "|Virginia| 5/5/1995||0.660|||1995"
## [49] "|Washington||1961|N/A|1961"
## [50] "|West Virginia|7/7/1989||0.488|||1990"
## [51] "|Wisconsin| 11/1/2011||0.167|||2012"
## [52] "|Wyoming||10/1/1994||0.252|||1995"
rownames(p_62) <- c()
colnames(p_62) <- c("STATE",
"E_Date_RTC",
"Frac_Yr_Eff_Yr_Pass",
"RTC_Date_SA")
sapply(p_62, class)## STATE E_Date_RTC Frac_Yr_Eff_Yr_Pass RTC_Date_SA
## "character" "character" "character" "character"
p_62 <- p_62 %>%
dplyr::select(STATE, RTC_Date_SA) %>%
rename("RTC_LAW_YEAR"=RTC_Date_SA) %>%
mutate(RTC_LAW_YEAR = as.numeric(RTC_LAW_YEAR)) %>%
mutate(RTC_LAW_YEAR = case_when(RTC_LAW_YEAR == 0 ~ Inf,
TRUE ~ RTC_LAW_YEAR))
sapply(p_62, class)## STATE RTC_LAW_YEAR
## "character" "numeric"
## STATE RTC_LAW_YEAR
## 1 Alabama 1975
## 2 Alaska 1995
## 3 Arizona 1995
## 4 Arkansas 1996
## 5 California Inf
## 6 Colorado 2003
## [1] "YEAR" "STATE" "VARIABLE" "VALUE"
## [1] "YEAR" "STATE" "VARIABLE" "VALUE"
## [1] "STATE" "YEAR" "VALUE" "VARIABLE"
## [1] "YEAR" "STATE" "VALUE" "VARIABLE"
## [1] "YEAR" "VALUE" "STATE" "VARIABLE"
## # A tibble: 6 x 4
## YEAR STATE VARIABLE VALUE
## <dbl> <chr> <chr> <dbl>
## 1 1977 Alabama Black_Male_15_to_19_years 1.55
## 2 1977 Alabama Black_Male_20_to_39_years 3.04
## 3 1977 Alabama Other_Male_15_to_19_years 0.0178
## 4 1977 Alabama Other_Male_20_to_39_years 0.0642
## 5 1977 Alabama White_Male_15_to_19_years 3.58
## 6 1977 Alabama White_Male_20_to_39_years 11.1
## # A tibble: 6 x 4
## YEAR STATE VARIABLE VALUE
## <dbl> <chr> <chr> <dbl>
## 1 1977 Alabama Black_Female_10_to_19_years 3.01
## 2 1977 Alabama Black_Female_20_to_29_years 2.33
## 3 1977 Alabama Black_Female_30_to_39_years 1.29
## 4 1977 Alabama Black_Female_40_to_49_years 1.18
## 5 1977 Alabama Black_Female_50_to_64_years 1.73
## 6 1977 Alabama Black_Female_65_years_and_over 1.58
## # A tibble: 6 x 4
## STATE YEAR VALUE VARIABLE
## <chr> <dbl> <dbl> <chr>
## 1 Alabama 1977 7.3 Unemployment_rate
## 2 Alabama 1978 6.4 Unemployment_rate
## 3 Alabama 1979 7.2 Unemployment_rate
## 4 Alabama 1980 8.9 Unemployment_rate
## 5 Alabama 1981 10.6 Unemployment_rate
## 6 Alabama 1982 14.1 Unemployment_rate
## # A tibble: 6 x 4
## YEAR STATE VALUE VARIABLE
## <dbl> <chr> <dbl> <chr>
## 1 2018 Alabama 16 Poverty_rate
## 2 2018 Alaska 13.1 Poverty_rate
## 3 2018 Arizona 12.8 Poverty_rate
## 4 2018 Arkansas 15.9 Poverty_rate
## 5 2018 California 11.9 Poverty_rate
## 6 2018 Colorado 9.1 Poverty_rate
## # A tibble: 6 x 4
## YEAR VALUE STATE VARIABLE
## <dbl> <dbl> <chr> <chr>
## 1 1977 15293 Alabama Viol_crime_count
## 2 1978 15682 Alabama Viol_crime_count
## 3 1979 15578 Alabama Viol_crime_count
## 4 1980 17320 Alabama Viol_crime_count
## 5 1981 18423 Alabama Viol_crime_count
## 6 1982 17653 Alabama Viol_crime_count
DONOHUE_DF <- bind_rows(dem_DONOHUE,
ue_rate_data,
poverty_rate_data,
crime_data,
population_data,
ps_data) %>%
pivot_wider(names_from = "VARIABLE",
values_from = "VALUE") %>%
left_join(p_62 , by = c("STATE")) %>%
mutate(RTC_LAW = case_when(YEAR >= RTC_LAW_YEAR ~ TRUE,
TRUE ~ FALSE))
DONOHUE_DF %>%
group_by(YEAR) %>%
tally() %>%
filter(n != 51) %>%
print(n=dim(.)[1])## # A tibble: 33 x 2
## YEAR n
## <dbl> <int>
## 1 1980 52
## 2 1981 52
## 3 1982 52
## 4 1983 52
## 5 1984 52
## 6 1985 52
## 7 1986 52
## 8 1987 52
## 9 1988 52
## 10 1989 52
## 11 1990 52
## 12 1991 52
## 13 1992 52
## 14 1993 52
## 15 1994 52
## 16 1995 52
## 17 1996 52
## 18 1997 52
## 19 1998 52
## 20 1999 52
## 21 2000 52
## 22 2001 52
## 23 2002 52
## 24 2003 52
## 25 2004 52
## 26 2005 52
## 27 2006 52
## 28 2007 52
## 29 2008 52
## 30 2009 52
## 31 2010 52
## 32 2011 52
## 33 2012 52
## Alabama Alaska Arizona
## 44 44 44
## Arkansas California Colorado
## 44 44 44
## Connecticut D.C. Delaware
## 44 33 44
## District of Columbia Florida Georgia
## 44 44 44
## Hawaii Idaho Illinois
## 44 44 44
## Indiana Iowa Kansas
## 44 44 44
## Kentucky Louisiana Maine
## 44 44 44
## Maryland Massachusetts Michigan
## 44 44 44
## Minnesota Mississippi Missouri
## 44 44 44
## Montana Nebraska Nevada
## 44 44 44
## New Hampshire New Jersey New Mexico
## 44 44 44
## New York North Carolina North Dakota
## 44 44 44
## Ohio Oklahoma Oregon
## 44 44 44
## Pennsylvania Rhode Island South Carolina
## 44 44 44
## South Dakota Tennessee Texas
## 44 44 44
## Utah Vermont Virginia
## 44 44 44
## Washington West Virginia Wisconsin
## 44 44 44
## Wyoming
## 44
## [1] 44
DONOHUE_DF <- DONOHUE_DF %>%
mutate(STATE = fct_collapse(STATE, "District of Columbia"=c("District of Columbia","D.C.")))
summary(as.factor(DONOHUE_DF$STATE))## Alabama Alaska Arizona
## 44 44 44
## Arkansas California Colorado
## 44 44 44
## Connecticut District of Columbia Delaware
## 44 77 44
## Florida Georgia Hawaii
## 44 44 44
## Idaho Illinois Indiana
## 44 44 44
## Iowa Kansas Kentucky
## 44 44 44
## Louisiana Maine Maryland
## 44 44 44
## Massachusetts Michigan Minnesota
## 44 44 44
## Mississippi Missouri Montana
## 44 44 44
## Nebraska Nevada New Hampshire
## 44 44 44
## New Jersey New Mexico New York
## 44 44 44
## North Carolina North Dakota Ohio
## 44 44 44
## Oklahoma Oregon Pennsylvania
## 44 44 44
## Rhode Island South Carolina South Dakota
## 44 44 44
## Tennessee Texas Utah
## 44 44 44
## Vermont Virginia Washington
## 44 44 44
## West Virginia Wisconsin Wyoming
## 44 44 44
## [1] 51
DONOHUE_DF <- DONOHUE_DF %>%
group_by(STATE, YEAR) %>%
summarise_all(~na.omit(unique(.))) %>%
ungroup() # This identifies unique observations, coalesces rows according to the grouping variable(s), and gets rid of of units that have incomplete data. This gives returns a dataframe with the most complete information.
summary(as.factor(DONOHUE_DF$STATE)) ## Alabama Alaska Arizona
## 31 31 31
## Arkansas California Colorado
## 31 31 31
## Connecticut District of Columbia Delaware
## 31 31 31
## Florida Georgia Hawaii
## 31 31 31
## Idaho Illinois Indiana
## 31 31 31
## Iowa Kansas Kentucky
## 31 31 31
## Louisiana Maine Maryland
## 31 31 31
## Massachusetts Michigan Minnesota
## 31 31 31
## Mississippi Missouri Montana
## 31 31 31
## Nebraska Nevada New Hampshire
## 31 31 31
## New Jersey New Mexico New York
## 31 31 31
## North Carolina North Dakota Ohio
## 31 31 31
## Oklahoma Oregon Pennsylvania
## 31 31 31
## Rhode Island South Carolina South Dakota
## 31 31 31
## Tennessee Texas Utah
## 31 31 31
## Vermont Virginia Washington
## 31 31 31
## West Virginia Wisconsin Wyoming
## 31 31 31
baseline_year <- min(DONOHUE_DF$YEAR)
censoring_year <- max(DONOHUE_DF$YEAR)
# Need to fix this to ensure severe bias is not introduced by prevalent "cases"
DONOHUE_DF <- DONOHUE_DF %>%
mutate(TIME_0 = baseline_year,
TIME_INF = censoring_year) %>%
filter(RTC_LAW_YEAR > TIME_0)
DONOHUE_DF <- DONOHUE_DF %>%
mutate(Viol_crime_rate_1k = (Viol_crime_count*1000)/Population,
Viol_crime_rate_1k_log = log(Viol_crime_rate_1k),
Population_log = log(Population))
summary(droplevels(as.factor(DONOHUE_DF$STATE)))## Alaska Arizona Arkansas
## 31 31 31
## California Colorado District of Columbia
## 31 31 31
## Delaware Florida Georgia
## 31 31 31
## Hawaii Idaho Illinois
## 31 31 31
## Iowa Kansas Kentucky
## 31 31 31
## Louisiana Maine Maryland
## 31 31 31
## Massachusetts Michigan Minnesota
## 31 31 31
## Mississippi Missouri Montana
## 31 31 31
## Nebraska Nevada New Jersey
## 31 31 31
## New Mexico New York North Carolina
## 31 31 31
## North Dakota Ohio Oklahoma
## 31 31 31
## Oregon Pennsylvania Rhode Island
## 31 31 31
## South Carolina South Dakota Tennessee
## 31 31 31
## Texas Utah Virginia
## 31 31 31
## West Virginia Wisconsin Wyoming
## 31 31 31
## [1] 45
LOTT_DF <- bind_rows(dem_LOTT,
ue_rate_data,
poverty_rate_data,
crime_data,
population_data,
ps_data) %>%
pivot_wider(names_from = "VARIABLE",
values_from = "VALUE") %>%
left_join(p_62 , by = c("STATE")) %>%
mutate(RTC_LAW = case_when(YEAR >= RTC_LAW_YEAR ~ TRUE,
TRUE ~ FALSE))
LOTT_DF %>%
group_by(YEAR) %>%
tally() %>%
filter(n != 51) %>%
print(n=dim(.)[1])## # A tibble: 33 x 2
## YEAR n
## <dbl> <int>
## 1 1980 52
## 2 1981 52
## 3 1982 52
## 4 1983 52
## 5 1984 52
## 6 1985 52
## 7 1986 52
## 8 1987 52
## 9 1988 52
## 10 1989 52
## 11 1990 52
## 12 1991 52
## 13 1992 52
## 14 1993 52
## 15 1994 52
## 16 1995 52
## 17 1996 52
## 18 1997 52
## 19 1998 52
## 20 1999 52
## 21 2000 52
## 22 2001 52
## 23 2002 52
## 24 2003 52
## 25 2004 52
## 26 2005 52
## 27 2006 52
## 28 2007 52
## 29 2008 52
## 30 2009 52
## 31 2010 52
## 32 2011 52
## 33 2012 52
## Alabama Alaska Arizona
## 44 44 44
## Arkansas California Colorado
## 44 44 44
## Connecticut D.C. Delaware
## 44 33 44
## District of Columbia Florida Georgia
## 44 44 44
## Hawaii Idaho Illinois
## 44 44 44
## Indiana Iowa Kansas
## 44 44 44
## Kentucky Louisiana Maine
## 44 44 44
## Maryland Massachusetts Michigan
## 44 44 44
## Minnesota Mississippi Missouri
## 44 44 44
## Montana Nebraska Nevada
## 44 44 44
## New Hampshire New Jersey New Mexico
## 44 44 44
## New York North Carolina North Dakota
## 44 44 44
## Ohio Oklahoma Oregon
## 44 44 44
## Pennsylvania Rhode Island South Carolina
## 44 44 44
## South Dakota Tennessee Texas
## 44 44 44
## Utah Vermont Virginia
## 44 44 44
## Washington West Virginia Wisconsin
## 44 44 44
## Wyoming
## 44
## [1] 44
LOTT_DF <- LOTT_DF %>%
mutate(STATE = fct_collapse(STATE, "District of Columbia"=c("District of Columbia","D.C.")))
summary(as.factor(LOTT_DF$STATE))## Alabama Alaska Arizona
## 44 44 44
## Arkansas California Colorado
## 44 44 44
## Connecticut District of Columbia Delaware
## 44 77 44
## Florida Georgia Hawaii
## 44 44 44
## Idaho Illinois Indiana
## 44 44 44
## Iowa Kansas Kentucky
## 44 44 44
## Louisiana Maine Maryland
## 44 44 44
## Massachusetts Michigan Minnesota
## 44 44 44
## Mississippi Missouri Montana
## 44 44 44
## Nebraska Nevada New Hampshire
## 44 44 44
## New Jersey New Mexico New York
## 44 44 44
## North Carolina North Dakota Ohio
## 44 44 44
## Oklahoma Oregon Pennsylvania
## 44 44 44
## Rhode Island South Carolina South Dakota
## 44 44 44
## Tennessee Texas Utah
## 44 44 44
## Vermont Virginia Washington
## 44 44 44
## West Virginia Wisconsin Wyoming
## 44 44 44
## [1] 51
LOTT_DF <- LOTT_DF %>%
group_by(STATE, YEAR) %>%
summarise_all(~na.omit(unique(.))) %>%
ungroup() # This identifies unique observations, coalesces rows according to the grouping variable(s), and gets rid of of units that have incomplete data. This gives returns a dataframe with the most complete information.
summary(as.factor(LOTT_DF$STATE)) ## Alabama Alaska Arizona
## 31 31 31
## Arkansas California Colorado
## 31 31 31
## Connecticut District of Columbia Delaware
## 31 31 31
## Florida Georgia Hawaii
## 31 31 31
## Idaho Illinois Indiana
## 31 31 31
## Iowa Kansas Kentucky
## 31 31 31
## Louisiana Maine Maryland
## 31 31 31
## Massachusetts Michigan Minnesota
## 31 31 31
## Mississippi Missouri Montana
## 31 31 31
## Nebraska Nevada New Hampshire
## 31 31 31
## New Jersey New Mexico New York
## 31 31 31
## North Carolina North Dakota Ohio
## 31 31 31
## Oklahoma Oregon Pennsylvania
## 31 31 31
## Rhode Island South Carolina South Dakota
## 31 31 31
## Tennessee Texas Utah
## 31 31 31
## Vermont Virginia Washington
## 31 31 31
## West Virginia Wisconsin Wyoming
## 31 31 31
baseline_year <- min(LOTT_DF$YEAR)
censoring_year <- max(LOTT_DF$YEAR)
# Need to fix this to ensure severe bias is not introduced by prevalent "cases"
LOTT_DF <- LOTT_DF %>%
mutate(TIME_0 = baseline_year,
TIME_INF = censoring_year) %>%
filter(RTC_LAW_YEAR > TIME_0)
LOTT_DF <- LOTT_DF %>%
mutate(Viol_crime_rate_1k = (Viol_crime_count*1000)/Population,
Viol_crime_rate_1k_log = log(Viol_crime_rate_1k),
Population_log = log(Population))
summary(droplevels(as.factor(LOTT_DF$STATE)))## Alaska Arizona Arkansas
## 31 31 31
## California Colorado District of Columbia
## 31 31 31
## Delaware Florida Georgia
## 31 31 31
## Hawaii Idaho Illinois
## 31 31 31
## Iowa Kansas Kentucky
## 31 31 31
## Louisiana Maine Maryland
## 31 31 31
## Massachusetts Michigan Minnesota
## 31 31 31
## Mississippi Missouri Montana
## 31 31 31
## Nebraska Nevada New Jersey
## 31 31 31
## New Mexico New York North Carolina
## 31 31 31
## North Dakota Ohio Oklahoma
## 31 31 31
## Oregon Pennsylvania Rhode Island
## 31 31 31
## South Carolina South Dakota Tennessee
## 31 31 31
## Texas Utah Virginia
## 31 31 31
## West Virginia Wisconsin Wyoming
## 31 31 31
## [1] 45
## STATE YEAR Black_Male_15_to_19_years
## "factor" "numeric" "numeric"
## Black_Male_20_to_39_years Other_Male_15_to_19_years Other_Male_20_to_39_years
## "numeric" "numeric" "numeric"
## White_Male_15_to_19_years White_Male_20_to_39_years Unemployment_rate
## "numeric" "numeric" "numeric"
## Poverty_rate Viol_crime_count Population
## "numeric" "numeric" "numeric"
## police_per_100k_lag RTC_LAW_YEAR RTC_LAW
## "numeric" "numeric" "logical"
## TIME_0 TIME_INF Viol_crime_rate_1k
## "numeric" "numeric" "numeric"
## Viol_crime_rate_1k_log Population_log
## "numeric" "numeric"
DONOHUE_DF %>%
mutate(Viol_crime_rate_100k_log = log((Viol_crime_count*100000)/Population)) %>%
ggplot(aes(x = YEAR, y = Viol_crime_rate_100k_log, color = STATE)) +
geom_point(size = 0.5) +
geom_line(aes(group=STATE),
size = 0.5,
show.legend = FALSE) +
geom_text_repel(data = DONOHUE_DF %>%
mutate(Viol_crime_rate_100k_log = log((Viol_crime_count*100000)/Population)) %>%
filter(YEAR == last(YEAR)),
aes(label = STATE,
x = YEAR,
y = Viol_crime_rate_100k_log),
size = 3,
alpha = 1,
nudge_x = 10,
direction = "y",
hjust = 1,
vjust = 1,
segment.size = 0.25,
segment.alpha = 0.25,
force = 1,
max.iter = 9999) +
guides(color = FALSE) +
scale_x_continuous(breaks = seq(1980, 2015, by = 1),
limits = c(1980, 2015),
labels = c(seq(1980, 2010, by = 1), rep("", 5))) +
scale_y_continuous(breaks = seq(3.5, 8.5, by = 0.5),
limits = c(3.5, 8.5)) +
labs(title = "States have different levels of crime",
x = "Year",
y = "ln(violent crimes per 100,000 people)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90))DONOHUE_DF %>%
group_by(YEAR) %>%
summarise(Viol_crime_count = sum(Viol_crime_count),
Population = sum(Population),
.groups = "drop") %>%
mutate(Viol_crime_rate_100k_log = log((Viol_crime_count*100000)/Population)) %>%
ggplot(aes(x = YEAR, y = Viol_crime_rate_100k_log)) +
geom_line() +
scale_x_continuous(breaks = seq(1980, 2010, by = 1),
limits = c(1980, 2010),
labels = c(seq(1980, 2010, by = 1))) +
scale_y_continuous(breaks = seq(5.75, 6.75, by = 0.25),
limits = c(5.75, 6.75)) +
labs(title = "Crime rates fluctuate over time",
x = "Year",
y = "ln(violent crimes per 100,000 people)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90))Some code taken from http://karthur.org/2019/implementing-fixed-effects-panel-models-in-r.html
d_panel_DONOHUE <- pdata.frame(DONOHUE_DF, index=c("STATE", "YEAR"))
DONOHUE_OUTPUT <- plm(Viol_crime_rate_1k_log ~
RTC_LAW +
White_Male_15_to_19_years +
White_Male_20_to_39_years +
Black_Male_15_to_19_years +
Black_Male_20_to_39_years +
Other_Male_15_to_19_years +
Other_Male_20_to_39_years +
Unemployment_rate +
Poverty_rate +
Population_log +
police_per_100k_lag,
effect = "twoways",
model = "within",
data=d_panel_DONOHUE)
summary(DONOHUE_OUTPUT)## Twoways effects Within Model
##
## Call:
## plm(formula = Viol_crime_rate_1k_log ~ RTC_LAW + White_Male_15_to_19_years +
## White_Male_20_to_39_years + Black_Male_15_to_19_years + Black_Male_20_to_39_years +
## Other_Male_15_to_19_years + Other_Male_20_to_39_years + Unemployment_rate +
## Poverty_rate + Population_log + police_per_100k_lag, data = d_panel_DONOHUE,
## effect = "twoways", model = "within")
##
## Balanced Panel: n = 45, T = 31, N = 1395
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -0.5716985 -0.0933827 0.0022014 0.0896372 1.0943035
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## RTC_LAWTRUE 0.02306214 0.01666508 1.3839 0.1666372
## White_Male_15_to_19_years -0.00263364 0.02732105 -0.0964 0.9232208
## White_Male_20_to_39_years 0.04060493 0.00960488 4.2275 2.527e-05 ***
## Black_Male_15_to_19_years -0.10348754 0.05695300 -1.8171 0.0694351 .
## Black_Male_20_to_39_years 0.12003823 0.01938589 6.1920 7.934e-10 ***
## Other_Male_15_to_19_years 0.66332029 0.11311115 5.8643 5.703e-09 ***
## Other_Male_20_to_39_years -0.25380488 0.03938074 -6.4449 1.622e-10 ***
## Unemployment_rate -0.01626228 0.00490799 -3.3134 0.0009468 ***
## Poverty_rate -0.00890507 0.00295638 -3.0122 0.0026438 **
## Population_log -0.22442622 0.06060682 -3.7030 0.0002219 ***
## police_per_100k_lag 0.00047990 0.00013737 3.4935 0.0004926 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 43.211
## Residual Sum of Squares: 36.917
## R-Squared: 0.14565
## Adj. R-Squared: 0.090168
## F-statistic: 20.2864 on 11 and 1309 DF, p-value: < 2.22e-16
Some code taken from http://karthur.org/2019/implementing-fixed-effects-panel-models-in-r.html
LOTT_variables <- LOTT_DF %>%
dplyr::select(RTC_LAW,
contains(c("White","Black","Other")),
Unemployment_rate,
Poverty_rate,
Population_log,
police_per_100k_lag) %>%
colnames()
LOTT_fmla <- as.formula(paste("Viol_crime_rate_1k_log ~",
paste(LOTT_variables, collapse = " + ")
)
)
d_panel_LOTT <- pdata.frame(LOTT_DF, index=c("STATE", "YEAR"))
LOTT_OUTPUT <- plm(LOTT_fmla,
model = "within",
effect = "twoways",
data=d_panel_LOTT)
summary(LOTT_OUTPUT)## Twoways effects Within Model
##
## Call:
## plm(formula = LOTT_fmla, data = d_panel_LOTT, effect = "twoways",
## model = "within")
##
## Balanced Panel: n = 45, T = 31, N = 1395
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -0.565457 -0.078747 0.001635 0.079232 0.577838
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## RTC_LAWTRUE -0.05422970 0.01658286 -3.2702 0.001103 **
## White_Female_10_to_19_years 0.65063823 0.15149131 4.2949 1.880e-05 ***
## White_Female_20_to_29_years -0.02997137 0.06349534 -0.4720 0.636990
## White_Female_30_to_39_years 0.13132568 0.08104309 1.6204 0.105384
## White_Female_40_to_49_years 0.09211246 0.08234849 1.1186 0.263534
## White_Female_50_to_64_years -0.37475798 0.06335128 -5.9156 4.240e-09 ***
## White_Female_65_years_and_over 0.20314547 0.04759664 4.2681 2.117e-05 ***
## White_Male_10_to_19_years -0.61566593 0.14489592 -4.2490 2.303e-05 ***
## White_Male_20_to_29_years 0.06466063 0.05901511 1.0957 0.273433
## White_Male_30_to_39_years -0.10412806 0.08620494 -1.2079 0.227304
## White_Male_40_to_49_years -0.21815118 0.07329480 -2.9764 0.002972 **
## White_Male_50_to_64_years 0.38433281 0.07355515 5.2251 2.031e-07 ***
## White_Male_65_years_and_over -0.21601720 0.06691569 -3.2282 0.001277 **
## Black_Female_10_to_19_years -1.20662463 0.43502280 -2.7737 0.005623 **
## Black_Female_20_to_29_years 0.02942780 0.17544389 0.1677 0.866820
## Black_Female_30_to_39_years -0.15149500 0.20475568 -0.7399 0.459508
## Black_Female_40_to_49_years 0.42380646 0.23611111 1.7949 0.072898 .
## Black_Female_50_to_64_years 0.13802304 0.21419499 0.6444 0.519444
## Black_Female_65_years_and_over -0.07820224 0.24128320 -0.3241 0.745908
## Black_Male_10_to_19_years 1.36102016 0.44538035 3.0559 0.002291 **
## Black_Male_20_to_29_years -0.14034048 0.18460396 -0.7602 0.447260
## Black_Male_30_to_39_years 0.37937138 0.23590699 1.6081 0.108051
## Black_Male_40_to_49_years -0.58107771 0.27188867 -2.1372 0.032772 *
## Black_Male_50_to_64_years -0.25317586 0.24011393 -1.0544 0.291899
## Black_Male_65_years_and_over -0.46825225 0.34645213 -1.3516 0.176754
## Other_Female_10_to_19_years 0.57481127 0.49957581 1.1506 0.250112
## Other_Female_20_to_29_years -1.12453492 0.27172673 -4.1385 3.725e-05 ***
## Other_Female_30_to_39_years -3.15698149 0.35788016 -8.8213 < 2.2e-16 ***
## Other_Female_40_to_49_years 0.96646809 0.42423419 2.2781 0.022882 *
## Other_Female_50_to_64_years 2.97254960 0.34040734 8.7323 < 2.2e-16 ***
## Other_Female_65_years_and_over 2.25872753 0.20551782 10.9904 < 2.2e-16 ***
## Other_Male_10_to_19_years 0.24715044 0.48305107 0.5116 0.608988
## Other_Male_20_to_29_years 1.58436219 0.25907190 6.1155 1.276e-09 ***
## Other_Male_30_to_39_years 2.91519635 0.41689628 6.9926 4.336e-12 ***
## Other_Male_40_to_49_years -1.22100778 0.44740943 -2.7291 0.006439 **
## Other_Male_50_to_64_years -3.92082993 0.37595040 -10.4291 < 2.2e-16 ***
## Other_Male_65_years_and_over -4.10090950 0.37041352 -11.0712 < 2.2e-16 ***
## Unemployment_rate -0.00499765 0.00437844 -1.1414 0.253907
## Poverty_rate -0.00571967 0.00254133 -2.2507 0.024576 *
## Population_log -0.26192101 0.08472606 -3.0914 0.002035 **
## police_per_100k_lag 0.00051043 0.00012220 4.1771 3.153e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 43.211
## Residual Sum of Squares: 23.825
## R-Squared: 0.44863
## Adj. R-Squared: 0.39906
## F-statistic: 25.3824 on 41 and 1279 DF, p-value: < 2.22e-16
comparing_analyses <- DONOHUE_OUTPUT_TIDY %>%
bind_rows(LOTT_OUTPUT_TIDY) %>%
filter(term == "RTC_LAWTRUE")
library(grid)
comparing_analyses_plot <- ggplot(comparing_analyses) +
geom_point(aes(x = Analysis, y = estimate)) +
geom_errorbar(aes(x = Analysis, ymin = conf.low, ymax = conf.high), width = 0.25) +
geom_hline(yintercept = 0, color = "red") +
scale_y_continuous(breaks = seq(-0.2, 0.2, by = 0.05),
labels = seq(-0.2, 0.2, by = 0.05),
limits = c(-0.2,0.2)) +
geom_segment(aes(x = 1, y = 0.125, xend = 1, yend = 0.175),
arrow = arrow(angle = 45, ends = "last", type = "open"),
size = 2,
color = "green",
lineend = "butt",
linejoin = "mitre") +
geom_segment(aes(x = 2, y = -0.125, xend = 2, yend = -0.175),
arrow = arrow(angle = 45, ends = "last", type = "open"),
size = 2,
color = "red",
lineend = "butt",
linejoin = "mitre") +
theme_minimal() +
theme(axis.title.x = element_blank(),
axis.text = element_text(size = 12)) +
labs(title = "Effect estimate on ln(violent crimes per 100,000 people)",
y = "Effect estimate (95% CI)")
comparing_analyses_plotHow did the above happen?
The analysis dataframes are very similar yet rendered very different results.
## - different number of columns: 20 vs 50
## [1] TRUE
The only difference between the two dataframes rests in how the demographic variables were parameterized.
## [1] "Black_Male_15_to_19_years" "Black_Male_20_to_39_years"
## [3] "Other_Male_15_to_19_years" "Other_Male_20_to_39_years"
## [5] "White_Male_15_to_19_years" "White_Male_20_to_39_years"
## [1] "Black_Female_10_to_19_years" "Black_Female_20_to_29_years"
## [3] "Black_Female_30_to_39_years" "Black_Female_40_to_49_years"
## [5] "Black_Female_50_to_64_years" "Black_Female_65_years_and_over"
## [7] "Black_Male_10_to_19_years" "Black_Male_20_to_29_years"
## [9] "Black_Male_30_to_39_years" "Black_Male_40_to_49_years"
## [11] "Black_Male_50_to_64_years" "Black_Male_65_years_and_over"
## [13] "Other_Female_10_to_19_years" "Other_Female_20_to_29_years"
## [15] "Other_Female_30_to_39_years" "Other_Female_40_to_49_years"
## [17] "Other_Female_50_to_64_years" "Other_Female_65_years_and_over"
## [19] "Other_Male_10_to_19_years" "Other_Male_20_to_29_years"
## [21] "Other_Male_30_to_39_years" "Other_Male_40_to_49_years"
## [23] "Other_Male_50_to_64_years" "Other_Male_65_years_and_over"
## [25] "White_Female_10_to_19_years" "White_Female_20_to_29_years"
## [27] "White_Female_30_to_39_years" "White_Female_40_to_49_years"
## [29] "White_Female_50_to_64_years" "White_Female_65_years_and_over"
## [31] "White_Male_10_to_19_years" "White_Male_20_to_29_years"
## [33] "White_Male_30_to_39_years" "White_Male_40_to_49_years"
## [35] "White_Male_50_to_64_years" "White_Male_65_years_and_over"
Clearly, this had an effect on the results of the analysis.
Let’s explore how this occured.
When seemingly independent variables are highly related to one another, the relationships estimated in an analysis may be distorted.
In regression analysis, this distortion is often a byproduct of a violation of the independence assumption. This distortion, if large enough, can impact statistical inference.
There are several ways we can diagnose multicollinearity.
Again, multicollinearity often occurs when independent variables are highly related to one another. Consequently, we can evaluate these relationships be examining the correlation between variable pairs.
It is important to note that multicollinearity and correlation are not one and the same. Correlation can be thought of as the strength of the relationship between variables. On the other hand, multicollinearity can be thought of the the violation of the independence assumption that is a consequence of this correlation in a regression analysis.
## [1] "STATE" "YEAR"
## [3] "Black_Male_15_to_19_years" "Black_Male_20_to_39_years"
## [5] "Other_Male_15_to_19_years" "Other_Male_20_to_39_years"
## [7] "White_Male_15_to_19_years" "White_Male_20_to_39_years"
## [9] "Unemployment_rate" "Poverty_rate"
## [11] "Viol_crime_count" "Population"
## [13] "police_per_100k_lag" "RTC_LAW_YEAR"
## [15] "RTC_LAW" "TIME_0"
## [17] "TIME_INF" "Viol_crime_rate_1k"
## [19] "Viol_crime_rate_1k_log" "Population_log"
DONOHUE_DF %>%
dplyr::select(RTC_LAW,
Viol_crime_rate_1k_log,
Unemployment_rate,
Poverty_rate,
Population_log) %>%
ggpairs(.,
columns = c(2:5),
lower = list(continuous = wrap("smooth_loess",
color = "red",
alpha = 0.5,
size = 0.1)))LOTT_DF %>%
dplyr::select(RTC_LAW,
Viol_crime_rate_1k_log,
Unemployment_rate,
Poverty_rate,
Population_log) %>%
ggpairs(.,
columns = c(2:5),
lower = list(continuous = wrap("smooth_loess",
color = "red",
alpha = 0.5,
size = 0.1)))cor_DONOHUE_dem <- cor(DONOHUE_DF %>% dplyr::select(contains("_years")))
corr_mat_DONOHUE <- ggcorrplot(cor_DONOHUE_dem,
tl.cex = 6,
hc.order = TRUE,
colors = c("red",
"white",
"red"),
outline.color = "transparent",
title = "Correlation Matrix, Analysis 1",
legend.title = TeX("$\\rho$"))
corr_mat_DONOHUEcor_LOTT_dem <- cor(LOTT_DF %>% dplyr::select(contains("_years")))
corr_mat_LOTT <- ggcorrplot(cor_LOTT_dem,
tl.cex = 6,
hc.order = TRUE,
colors = c("red",
"white",
"red"),
outline.color = "transparent",
title = "Correlation Matrix, Analysis 2",
legend.title = TeX("$\\rho$"))
corr_mat_LOTTsims <- 250
# DONOHUE
# round(dim(DONOHUE_DF)[1]/2)
samps_DONOHUE <- lapply(rep(dim(DONOHUE_DF)[1]-1, sims),
function(x)DONOHUE_DF[sample(nrow(DONOHUE_DF),
size = x, replace = FALSE),])
fit_nls_on_bootstrap_DONOHUE <- function(split){
plm(Viol_crime_rate_1k_log ~
RTC_LAW +
White_Male_15_to_19_years +
White_Male_20_to_39_years +
Black_Male_15_to_19_years +
Black_Male_20_to_39_years +
Other_Male_15_to_19_years +
Other_Male_20_to_39_years +
Unemployment_rate +
Poverty_rate +
Population_log +
police_per_100k_lag,
data = data.frame(split),
index = c("STATE","YEAR"),
model = "within",
effect = "twoways")
}
samps_models_DONOHUE <- lapply(samps_DONOHUE, fit_nls_on_bootstrap_DONOHUE)
samps_models_DONOHUE <- samps_models_DONOHUE %>%
map(tidy)
names(samps_models_DONOHUE) <- paste0("DONOHUE_",1:length(samps_models_DONOHUE))
simulations_DONOHUE <- samps_models_DONOHUE %>%
bind_rows(.id = "ID") %>%
mutate(Analysis = "Analysis 1")
## LOTT
samps_LOTT <- lapply(rep(round(dim(LOTT_DF)[1]/2), sims),
function(x) LOTT_DF[sample(nrow(LOTT_DF),
size = x, replace = FALSE),])
fit_nls_on_bootstrap_LOTT <- function(split){
plm(LOTT_fmla,
data = data.frame(split),
index = c("STATE","YEAR"),
model = "within",
effect = "twoways")
}
samps_models_LOTT <- lapply(samps_LOTT, fit_nls_on_bootstrap_LOTT)
samps_models_LOTT <- samps_models_LOTT %>%
map(tidy)
names(samps_models_LOTT) <- paste0("LOTT_",1:length(samps_models_LOTT))
simulations_LOTT <- samps_models_LOTT %>%
bind_rows(.id = "Analysis") %>%
mutate(Analysis = "Analysis 2")
simulations <- bind_rows(simulations_DONOHUE,
simulations_LOTT)
simulation_plot <- simulations %>%
filter(term=="RTC_LAWTRUE") %>%
ggplot(aes(x = Analysis, y = estimate)) +
geom_jitter(alpha = 0.25,
width = 0.1) +
labs(title = "Coefficient instability",
subtitle = "Estimates sensitive to observation deletions",
x = "Term",
y = "Coefficient",
caption = "Results from simulations") +
theme_minimal() +
theme(axis.title.x = element_blank())
simulation_plotdesign.matrix <- as.data.frame(model.matrix(DONOHUE_OUTPUT))
design.matrix$Viol_crime_rate_1k_log <- plm::Within(
d_panel_DONOHUE$Viol_crime_rate_1k_log)
lm_DONOHUE <- lm(Viol_crime_rate_1k_log ~
RTC_LAWTRUE + # logical class changes variable name after inital model
White_Male_15_to_19_years +
White_Male_20_to_39_years +
Black_Male_15_to_19_years +
Black_Male_20_to_39_years +
Other_Male_15_to_19_years +
Other_Male_20_to_39_years +
Unemployment_rate +
Poverty_rate +
Population_log +
police_per_100k_lag,
data = design.matrix)
vif(lm_DONOHUE)## RTC_LAWTRUE White_Male_15_to_19_years White_Male_20_to_39_years
## 1.095268 1.172703 1.685342
## Black_Male_15_to_19_years Black_Male_20_to_39_years Other_Male_15_to_19_years
## 1.313339 1.656860 1.574839
## Other_Male_20_to_39_years Unemployment_rate Poverty_rate
## 1.623750 1.242150 1.262682
## Population_log police_per_100k_lag
## 1.154825 1.163058
vif_DONOHUE <- vif(lm_DONOHUE)
vif_DONOHUE <- vif_DONOHUE %>%
as_tibble() %>%
cbind(., names(vif_DONOHUE)) %>%
as_tibble()
colnames(vif_DONOHUE) <- c("VIF", "Variable")
max_vif_DONOHUE <- max(vif(lm_DONOHUE)) design.matrix <- as.data.frame(model.matrix(LOTT_OUTPUT))
design.matrix$Viol_crime_rate_1k_log <- plm::Within(
d_panel_LOTT$Viol_crime_rate_1k_log)
LOTT_variables_ols <- LOTT_DF %>%
dplyr::select(RTC_LAW,
contains(c("White","Black","Other")),
Unemployment_rate,
Poverty_rate,
Population_log,
police_per_100k_lag) %>%
colnames() %>%
str_replace("RTC_LAW", "RTC_LAWTRUE") # logical class changes variable name after inital model
LOTT_fmla_ols <- as.formula(paste("Viol_crime_rate_1k_log ~",
paste(LOTT_variables_ols, collapse = " + ")
)
)
lm_LOTT <- lm(LOTT_fmla_ols,
data = design.matrix)
vif(lm_LOTT)## RTC_LAWTRUE White_Female_10_to_19_years
## 1.641916 127.733327
## White_Female_20_to_29_years White_Female_30_to_39_years
## 42.184712 49.980961
## White_Female_40_to_49_years White_Female_50_to_64_years
## 37.856684 36.547007
## White_Female_65_years_and_over White_Male_10_to_19_years
## 12.863500 126.795600
## White_Male_20_to_29_years White_Male_30_to_39_years
## 39.477520 73.002593
## White_Male_40_to_49_years White_Male_50_to_64_years
## 31.686738 52.974511
## White_Male_65_years_and_over Black_Female_10_to_19_years
## 13.311999 330.982883
## Black_Female_20_to_29_years Black_Female_30_to_39_years
## 106.841150 78.289450
## Black_Female_40_to_49_years Black_Female_50_to_64_years
## 98.421635 66.551531
## Black_Female_65_years_and_over Black_Male_10_to_19_years
## 48.358137 318.032781
## Black_Male_20_to_29_years Black_Male_30_to_39_years
## 89.283293 87.978290
## Black_Male_40_to_49_years Black_Male_50_to_64_years
## 91.913602 64.235719
## Black_Male_65_years_and_over Other_Female_10_to_19_years
## 37.575659 143.610940
## Other_Female_20_to_29_years Other_Female_30_to_39_years
## 65.320481 55.395405
## Other_Female_40_to_49_years Other_Female_50_to_64_years
## 222.043147 132.105354
## Other_Female_65_years_and_over Other_Male_10_to_19_years
## 82.816114 153.770551
## Other_Male_20_to_29_years Other_Male_30_to_39_years
## 54.915467 63.326933
## Other_Male_40_to_49_years Other_Male_50_to_64_years
## 241.793319 174.690518
## Other_Male_65_years_and_over Unemployment_rate
## 53.654443 1.496689
## Poverty_rate Population_log
## 1.412607 3.416913
## police_per_100k_lag
## 1.393440
vif_LOTT <- vif(lm_LOTT)
vif_LOTT <- vif_LOTT %>%
as_tibble() %>%
cbind(., names(vif_LOTT)) %>%
as_tibble()
colnames(vif_LOTT) <- c("VIF", "Variable")
max_vif_LOTT <- max(vif(lm_LOTT))## [1] 1.685342
## [1] 330.9829
\[\frac{1}{1-R_{i}^{2}}\]
vif_DONOHUE$Analysis <- "Analysis 1"
vif_LOTT$Analysis <- "Analysis 2"
vif_df <- rbind(vif_DONOHUE,
vif_LOTT)
vif_plot <- vif_df %>%
ggplot(aes(x = Analysis, y = VIF)) +
geom_jitter(width = 0.1, alpha = 0.5, size = 2) +
geom_hline(yintercept = 10, color = "red") +
scale_y_continuous(trans = 'log10',
limits = c(1,1000)) +
labs(title = "Variance inflation factors") +
theme_minimal() +
theme(axis.title.x = element_blank())
vif_plot